Linux wireless drivers development

Linux wireless drivers development
 help / color / mirror / Atom feed

* Re: p54spi: dbm_antsignal value, is it RSSI or dbm?
From: Christian Lamparter @ 2009-09-21 19:04 UTC (permalink / raw)
  To: Andre K; +Cc: linux-wireless
In-Reply-To: <76516bfc0909211119i4362ca35k13fc1f87857fb393@mail.gmail.com>

On Monday 21 September 2009 20:19:35 Andre K wrote:
> Hi all,
> 
> I've been using the new p54spi driver on a couple of n800s. Some specifics:
> Kernel : 2.6.29-omap1
> Wireless-compat: compat-wireless-2009-08-22
> OS: Maemo
> Options:  disabled CONFIG_RFKILL_BACKPORT_LEDS because it didn't compile
> 
> The driver loads and functions normally. However, the values for the
> dbm_antsignal (from RadioTap header), are they RSSI or dbm? Initially
> I thought they were dbm since tcpdump showed negative values, which
> makes sense:
> 
> Tcpdump:
> 14:08:25.312988 4469382526us tsft 2.0 Mb/s 2437 MHz (0x00a0) -49dB
> signal -41dB noise antenna 0 [0x0000000e] Acknowledgment RA:00:1d:e0:3d:5a:53 (oui Unknown)
heh, noise is higher than the signal.

> 14:08:25.313903 4469383345us tsft 1.0 Mb/s 2437 MHz (0x00a0) 4dB
> signal -41dB noise antenna 0 [0x0000000e] 04:c2:08:00:07:07 (oui
> Unknown) Unknown SSAP 0x0a >
> 
> Which shows up the first positive dbm value. Next I set up another
> n800 to broadcast ping packets (using hping3, since busybox's ping
> doesn't allow broadcast).
>
> 14:12:22.066192 13296010370us tsft 1.0 Mb/s 2437 MHz (0x00a0) 87dB
> signal -62dB noise antenna 0 [0x0000000e] IP 192.168.10.11 >
> 192.168.10.255: ICMP echo request, id 6662, seq 40704, length 8
> 14:12:22.092376 13296036864us tsft 18.0 Mb/s 2437 MHz (0x00c0) 82dB
> signal -62dB noise antenna 0 [0x0000000e] IP 192.168.10.11 >
> 192.168.10.255: ICMP echo request, id 6662, seq 40960, length 8
> 14:12:22.094787 13296037678us tsft 1.0 Mb/s 2437 MHz (0x00a0) 112dB
> signal -62dB noise antenna 0 [0x0000000e] IP 192.168.10.11 >
> 192.168.10.255: ICMP echo request, id 6662, seq 40960, length 8
> 
> I get strange dbm values, in the range of 50 - 100dbm, which doesn't
> make much sense.
> 
> Does anyone have an idea what value is being returned and if there is
> an error in the dbm calculation?
>

no, this has been on the TODO list for some time now.

(drivers/net/wireless/p54/txrx.c)
static int p54_rssi_to_dbm(struct p54_common *priv, int rssi)
{
        int band = priv->hw->conf.channel->band;

        if (priv->rxhw != 5)
                return ((rssi * priv->rssical_db[band].mul) / 64 +
                         priv->rssical_db[band].add) / 4;
        else
                /*
                 * TODO: find the correct formula
                 */
                return ((rssi * priv->rssical_db[band].mul) / 64 +
                         priv->rssical_db[band].add) / 4;
}

the .mul & .add values can be retrieved from the p54spi_eeprom.h blob.

0x09, 0x00, 0xad, 0xde,         /* PDR_RSSI_LINEAR_APPROXIMATION_CUSTOM */
        0x0a, 0x01, 0x72, 0xfe, 0x1a, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

=> translates into: (best viewed with a fixed-width font)
.mul           = 0x010a = 416 (base 10)
.add           = 0xfe72 = -398
.longbow_unkn1 = 0x001a = 26
.longbow_unkn2 = 0x0000 = 0

---

stlc45xx has a completely different phy/rf (longbow - rxhw = 5) and
the specs are not a part of the freely available fw iface documentation.

here's the relevant quote from a Nokia official: "
> If this is so, what's so secret about the values that it needs a
> binary-only tool and hidden storage area? Is there legal issues with
> it (related to the radio device regulations)?

Yes, this is because regulatory requirements. So thank FCC and ETSI
for all this. This causes much grief for all wireless vendors, for
example Intel had to do major changes in iwl3945 firmware for their
mac80211 driver. "
(ref: http://www.mail-archive.com/stlc45xx-devel@garage.maemo.org/msg00021.html )

---

if you need real dBm reading for signal and noise, you have
to some "reverse engineering" on your own or find someone else
with the hardware and some spare time to burn.

Regards,
	Chr

^ permalink raw reply

* p54spi: dbm_antsignal value, is it RSSI or dbm?
From: Andre K @ 2009-09-21 18:19 UTC (permalink / raw)
  To: linux-wireless

Hi all,

I've been using the new p54spi driver on a couple of n800s. Some specifics:
Kernel : 2.6.29-omap1
Wireless-compat: compat-wireless-2009-08-22
OS: Maemo
Options:  disabled CONFIG_RFKILL_BACKPORT_LEDS because it didn't compile

The driver loads and functions normally. However, the values for the
dbm_antsignal (from RadioTap header), are they RSSI or dbm? Initially
I thought they were dbm since tcpdump showed negative values, which
makes sense:

14:05:39.458831 8602124us tsft 1.0 Mb/s 2412 MHz (0x00a0) -46dB signal
-60dB noise antenna 0 [0x0000000e] Beacon (**SSID_REMOVED**) [1.0*
2.0* 5.5* 11.0* 6.0 9.0 12.0 18.0 Mbit] ESS CH: 1, PRIVACY
14:05:39.461090 8603807us tsft 1.0 Mb/s 2412 MHz (0x00a0) -52dB signal
-60dB noise antenna 0 [0x0000000e] Beacon (**SSID_REMOVED**) [1.0*
2.0* 5.5* 11.0* 6.0 12.0 24.0 36.0 Mbit] ESS CH: 1, PRIVACY
14:05:39.506713 8652588us tsft 1.0 Mb/s 2412 MHz (0x00a0) -52dB signal
-60dB noise antenna 0 [0x0000000e] Beacon (**SSID_REMOVED**) [1.0*
2.0* 5.5* 11.0* Mbit] ESS CH: 1, PRIVACY
14:05:39.541900 8687877us tsft 1.0 Mb/s 2412 MHz (0x00a0) -47dB signal
-60dB noise antenna 0 [0x0000000e] Beacon (**SSID_REMOVED**)[|802.11]

Output of iwconfig:
          wlan0     IEEE 802.11bg  Mode:Monitor  Tx-Power=27 dBm
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off

However, when I select a channel (6 in this case) its starts to behave
strangely:
Output of iwconfig:
          wlan0     IEEE 802.11bg  Mode:Monitor  Frequency:2.437 GHz
Tx-Power=27 dBm
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off

Tcpdump:
14:08:25.312988 4469382526us tsft 2.0 Mb/s 2437 MHz (0x00a0) -49dB
signal -41dB noise antenna 0 [0x0000000e] Acknowledgment
RA:00:1d:e0:3d:5a:53 (oui Unknown)
14:08:25.313903 4469383345us tsft 1.0 Mb/s 2437 MHz (0x00a0) 4dB
signal -41dB noise antenna 0 [0x0000000e] 04:c2:08:00:07:07 (oui
Unknown) Unknown SSAP 0x0a >

Which shows up the first positive dbm value. Next I set up another
n800 to broadcast ping packets (using hping3, since busybox's ping
doesn't allow broadcast).

14:12:22.066192 13296010370us tsft 1.0 Mb/s 2437 MHz (0x00a0) 87dB
signal -62dB noise antenna 0 [0x0000000e] IP 192.168.10.11 >
192.168.10.255: ICMP echo request, id 6662, seq 40704, length 8
14:12:22.092376 13296036864us tsft 18.0 Mb/s 2437 MHz (0x00c0) 82dB
signal -62dB noise antenna 0 [0x0000000e] IP 192.168.10.11 >
192.168.10.255: ICMP echo request, id 6662, seq 40960, length 8
14:12:22.094787 13296037678us tsft 1.0 Mb/s 2437 MHz (0x00a0) 112dB
signal -62dB noise antenna 0 [0x0000000e] IP 192.168.10.11 >
192.168.10.255: ICMP echo request, id 6662, seq 40960, length 8

I get strange dbm values, in the range of 50 - 100dbm, which doesn't
make much sense.

Does anyone have an idea what value is being returned and if there is
an error in the dbm calculation?

Thanks,
Andre.

P.S. Would be happy to provide more info. and feedback.

^ permalink raw reply

* Re: [PATCH] mac80211: Fix [re]association power saving issue on AP side
From: Johannes Berg @ 2009-09-21 17:19 UTC (permalink / raw)
  To: Igor Perminov; +Cc: John W. Linville, Jouni Malinen, linux-wireless
In-Reply-To: <1253200137.26765.338.camel@sunlight>

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

Hi,

Jouni and I just discussed this and noticed that there are even more
things broken that your patch doesn't fix either, so we will need just
the last hunk of your patch (quoted below), and a one-line fix in
hostapd.

And then we can also remove the workaround from net/mac80211/cfg.c (the
entire EEXIST branch).

Thanks,
johannes

> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -367,7 +367,10 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx)
>         struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data;
>         u32 staflags;
>  
> -       if (unlikely(!sta || ieee80211_is_probe_resp(hdr->frame_control)))
> +       if (unlikely(!sta || ieee80211_is_probe_resp(hdr->frame_control)
> +                       || ieee80211_is_auth(hdr->frame_control)
> +                       || ieee80211_is_assoc_resp(hdr->frame_control)
> +                       || ieee80211_is_reassoc_resp(hdr->frame_control)))
>                 return TX_CONTINUE;
>  
>         staflags = get_sta_flags(sta);
> 
> 
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: Problems with "cfg80211: fix SME connect" commit
From: Albert Herranz @ 2009-09-21 16:11 UTC (permalink / raw)
  To: Holger Schurig; +Cc: johannes, linville, linux-wireless
In-Reply-To: <200909210845.51247.hs4233@mail.mn-solutions.de>

Holger Schurig wrote:
> Can you try "[PATCH] cfg80211: don't overwrite privacy setting" 
> from [1]?
> 
> [1] http://marc.info/?l=linux-wireless&m=125323296617306&w=2
> 
> 
> And can you use the mailing list archives?  This is now the third 
> time this poppep up, I just copied Sedat's mail for you.
> 

Hi,

Thanks for the information.

Adding back "cfg80211: fix SME connect" and applying "cfg80211: don't overwrite privacy setting" fixes the connection issue, but with a introduces a small difference vs the previous working version.
There is now an extra "deauthenticating by local choice (reason=3)" message in the logs.

* master-20090914

[   13.222940] b43-phy0 debug: Chip initialized
[   13.291359] b43-phy0 debug: PIO initialized
[   13.330116] b43-phy0 debug: QoS disabled
[   13.861636] b43-phy0 debug: Wireless interface started
[   13.969153] b43-phy0 debug: Adding Interface type 2
[   16.679249] wlan1: direct probe to AP 00:12:17:15:e7:79 (try 1)
[   16.700998] wlan1 direct probe responded
[   16.707013] wlan1: authenticate with AP 00:12:17:15:e7:79 (try 1)
[   16.720205] wlan1: authenticated
[   16.726261] wlan1: associate with AP 00:12:17:15:e7:79 (try 1)
[   16.740697] wlan1: RX AssocResp from 00:12:17:15:e7:79 (capab=0x431 status=0 aid=1)
[   16.758042] wlan1: associated

* master-20090916 + "cfg80211: don't overwrite privacy setting"

[   12.849778] b43-phy0 debug: Chip initialized
[   12.865561] b43-phy0 debug: PIO initialized
[   12.872482] b43-phy0 debug: QoS disabled
[   13.053373] b43-phy0 debug: Wireless interface started
[   13.218613] b43-phy0 debug: Adding Interface type 2
[   15.832582] wlan1: deauthenticating by local choice (reason=3)
[   16.131599] wlan1: direct probe to AP 00:12:17:15:e7:79 (try 1)
[   16.145589] wlan1 direct probe responded
[   16.154501] wlan1: authenticate with AP 00:12:17:15:e7:79 (try 1)
[   16.175640] wlan1: authenticated
[   16.181829] wlan1: associate with AP 00:12:17:15:e7:79 (try 1)
[   16.198990] wlan1: RX AssocResp from 00:12:17:15:e7:79 (capab=0x431 status=0 aid=1)
[   16.210791] wlan1: associated

Any comments on this?

Thanks,
Albert

^ permalink raw reply

* Re: BUG?  I can reproduce "Association request to the driver failed" at will
From: Holger Schurig @ 2009-09-21 15:31 UTC (permalink / raw)
  To: linux-wireless; +Cc: hostap
In-Reply-To: <200909211309.40476.hs4233@mail.mn-solutions.de>

Hmm, this seems to be racy. This morning I succeeded 3 times to 
get that message, now I didn't on the first try. But now I 
enabled verbose-mac80211-debugging ...

But on the second try, I succeeded again.

> Let wpa_supplicant associate.

phy10: Selected rate control algorithm 'minstrel'
ath5k phy10: Atheros AR5213A chip found (MAC: 0x59, PHY: 0x43)
ath5k phy10: RF2112B 2GHz radio found (0x46)
eth1: direct probe to AP 00:13:19:80:da:30 (try 1)
eth1 direct probe responded
eth1: authenticate with AP 00:13:19:80:da:30 (try 1)
eth1: authenticated
eth1: associate with AP 00:13:19:80:da:30 (try 1)
eth1: RX AssocResp from 00:13:19:80:da:30 (capab=0x11 status=0 
aid=114)
eth1: associated

> Turn one AP off. wpa_supplicant automatically associates to the
> second. 

eth1: deauthenticated from 00:13:19:80:da:30 (Reason: 1)
eth1: direct probe to AP 00:13:19:80:da:30 (try 1)
eth1: direct probe to AP 00:13:19:80:da:30 (try 2)
eth1: direct probe to AP 00:13:19:80:da:30 (try 3)
eth1: direct probe to AP 00:13:19:80:da:30 timed out
eth1: direct probe to AP 00:1b:d4:44:35:90 (try 1)
eth1 direct probe responded
eth1: authenticate with AP 00:1b:d4:44:35:90 (try 1)
eth1: authenticated
eth1: associate with AP 00:1b:d4:44:35:90 (try 1)
eth1: RX AssocResp from 00:1b:d4:44:35:90 (capab=0x11 status=0 
aid=127)
eth1: associated

Side node: here mac80211 had stale scan data in the cache, thus 
wpa_supplicant first asked it to authenticate to the 
just-turned-off AP. You can see the three probes ...


> Now turn the first AP on again. Issue a manual scan command

eth1: direct probe to AP 00:13:19:80:da:30 (try 1)
eth1 direct probe responded
eth1: authenticate with AP 00:13:19:80:da:30 (try 1)
eth1: authenticate with AP 00:13:19:80:da:30 (try 2)
eth1: authenticated

-- 
http://www.holgerschurig.de

^ permalink raw reply

* Re: BUG: can bring wpa_supplicant/mac80211 into a stuck state at will
From: Holger Schurig @ 2009-09-21 15:22 UTC (permalink / raw)
  To: hostap; +Cc: linux-wireless
In-Reply-To: <200909211259.45431.hs4233@mail.mn-solutions.de>

Johannes asked me off-list for dmesg output:


> First, it associates to the MNHS-AP:

eth1: direct probe to AP 00:13:19:80:da:30 (try 1)
eth1 direct probe responded
eth1: authenticate with AP 00:13:19:80:da:30 (try 1)
eth1: authenticated
eth1: associate with AP 00:13:19:80:da:30 (try 1)
eth1: RX AssocResp from 00:13:19:80:da:30 (capab=0x11 status=0 
aid=112)
eth1: associate



> Next I turn of the AP where I'm associated to. wpa_supplicant
> successfully associates to the WEP-based AP:

eth1: deauthenticated from 00:13:19:80:da:30 (Reason: 1)
eth1: direct probe to AP 00:1b:53:11:dc:40 (try 1)
eth1 direct probe responded
eth1: authenticate with AP 00:1b:53:11:dc:40 (try 1)
eth1: authenticated
eth1: associate with AP 00:1b:53:11:dc:40 (try 1)
eth1: RX AssocResp from 00:1b:53:11:dc:40 (capab=0x11 status=0 
aid=91)
eth1: associated


> Now I turn on again the MNHS AP. And then I issue a scan
> from the cmdline: "iw eth1 scan trigger freq 2412"

eth1: direct probe to AP 00:13:19:80:da:30 (try 1)
eth1 direct probe responded
eth1: authenticate with AP 00:13:19:80:da:30 (try 1)
eth1: authenticated


Also the output of "iw eth1 info":

iw eth1 link
Connected to 00:1b:53:11:dc:40 (on eth1)
        SSID: MNFUNK
        freq: 2412
Authenticated with 00:13:19:80:da:30 (on eth1)
        RX: 78173 bytes (722 packets)
        TX: 144 bytes (4 packets)
        signal: -67 dBm
        tx bitrate: 11.0 MBit/s



If I understand this right, then mac80211 thinks it's connected, 
even before the 4-way key exchange?



-- 
http://www.holgerschurig.de

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Mel Gorman @ 2009-09-21 13:37 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Pekka Enberg, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V,
	Zhu Yi, Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <200909211512.14785.bzolnier@gmail.com>

On Mon, Sep 21, 2009 at 03:12:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> > On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > I don't know why people don't see it but for me it has a memory management
> > > > > regression and reliability issue written all over it.
> > > > 
> > > > Possibly but drivers that reload their firmware as a response to an
> > > > error condition is relatively new and loading network drivers while the
> > > > system is already up and running a long time does not strike me as
> > > > typical system behaviour.
> > > 
> > > Loading drivers after boot is a typical desktop/laptop behavior, please
> > > think about hotplug (the hardware in question is an USB dongle).
> > 
> > Yeah, I wonder what broke things. Did the wireless stack change in
> > 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
> 
> The thing is that the mm behavior change has been narrowed down already
> over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
> reports), I first though that that it was -next breakage but it turned out
> that it came the other way around (because -mm is not even pulled into -next
> currently -- great way to set an example for other kernel maintainers BTW).
> 

Is there a reliable reproduction case for this that narrowed it down to
2.6.31-rc1? That is the window where a number of page-allocator optimisation
patches made it in. None of them should have affected the allocator from a
fragmentation perspective though.

If you have a reliable reproduction case, testing between commits
d239171e4f6efd58d7e423853056b1b6a74f1446..a1dd268cf6306565a31a48deff8bf4f6b4b105f7
would be nice, particularly if it can be bisected within that small
window rather than a full bisect of an rc1 which I know can be a major
mess.

> I understand that behavior change may be justified and technically correct
> in itself.  I also completely agree that high order allocations in certain
> drivers need fixing anyway.
> 
> However there is something wrong with the big picture and the way changes
> are happening.  I'm not saying that I'm surprised though, especially given
> the recent decline in the quality assurance and the paradigm shift that
> I'm seeing (some influential top level people talking that -rc1 is fine for
> testing new code now or the "new kernel new hardware" thing).
> 

The quality assurance comment is a bit unfair with respect to the page
allocator. There are a lot of things that can have changed that would hose
order-6 atomic allocations. Furthermore, test cases used for mm patches
would not have taken into account such allocations as being critical. Even
if it was considered, it would have been dismissed as "it makes no sense
for drivers to be doing order-6 GFP_ATOMIC" allocations.

> Sorry but I have no more time currently to narrow down the issue some more
> (guess what, there are other kernel bugs standing in the way to bisect it
> and I would have to provide some reliable way to reproduce it first) so I
> see no more point in wasting people's time on this.  I can certainly get by
> with allocation failure here and there.  Not a big deal for me personally..
> 

That is somewhat unfortunate. Even testing within the window above if
possible would be very helpful if you get the chance.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 13:12 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
	Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <1253530608.5216.17.camel@penberg-laptop>

On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > I don't know why people don't see it but for me it has a memory management
> > > > regression and reliability issue written all over it.
> > > 
> > > Possibly but drivers that reload their firmware as a response to an
> > > error condition is relatively new and loading network drivers while the
> > > system is already up and running a long time does not strike me as
> > > typical system behaviour.
> > 
> > Loading drivers after boot is a typical desktop/laptop behavior, please
> > think about hotplug (the hardware in question is an USB dongle).
> 
> Yeah, I wonder what broke things. Did the wireless stack change in
> 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.

The thing is that the mm behavior change has been narrowed down already
over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
reports), I first though that that it was -next breakage but it turned out
that it came the other way around (because -mm is not even pulled into -next
currently -- great way to set an example for other kernel maintainers BTW).

I understand that behavior change may be justified and technically correct
in itself.  I also completely agree that high order allocations in certain
drivers need fixing anyway.

However there is something wrong with the big picture and the way changes
are happening.  I'm not saying that I'm surprised though, especially given
the recent decline in the quality assurance and the paradigm shift that
I'm seeing (some influential top level people talking that -rc1 is fine for
testing new code now or the "new kernel new hardware" thing).

Sorry but I have no more time currently to narrow down the issue some more
(guess what, there are other kernel bugs standing in the way to bisect it
and I would have to provide some reliable way to reproduce it first) so I
see no more point in wasting people's time on this.  I can certainly get by
with allocation failure here and there.  Not a big deal for me personally..

^ permalink raw reply

* BUG?  I can reproduce "Association request to the driver failed" at will
From: Holger Schurig @ 2009-09-21 11:09 UTC (permalink / raw)
  To: linux-wireless; +Cc: hostap

The following is an indication about yet another MLME problem:

Two APs, both set to the MNHS/WPA.

Let wpa_supplicant associate. Turn one AP off. wpa_supplicant
automatically associates to the second.

Now turn the first AP on again. Issue a manual scan command:

1253527140.577702: nl80211: Event message available
1253527140.577751: nl80211: Ignored unknown event (cmd=33)
1253527140.794448: nl80211: Event message available
1253527140.794490: nl80211: New scan results available
1253527140.794921: Received scan results (3 BSSes)
1253527140.794947: CTRL-EVENT-SCAN-RESULTS
1253527140.794960: Selecting BSS from priority group 0
1253527140.794972: Try to find WPA-enabled AP
1253527140.794984: 0: 00:13:19:80:da:30 ssid='MNWPA' wpa_ie_len=24 rsn_ie_len=0 caps=0x11
1253527140.795007:    selected based on WPA IE
1253527140.795018:    selected WPA AP 00:13:19:80:da:30 ssid='MNWPA'
1253527140.795035: Automatic auth_alg selection: 0x1
1253527140.795052: WPA: using IEEE 802.11i/D3.0
1253527140.795063: WPA: Selected cipher suites: group 8 pairwise 8 key_mgmt 2 proto 1
1253527140.795079: WPA: set AP WPA IE - hexdump(len=26): dd 18 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02 28 00
1253527140.795117: WPA: clearing AP RSN IE
1253527140.795129: WPA: using GTK TKIP
1253527140.795141: WPA: using PTK TKIP
1253527140.795154: WPA: using KEY_MGMT WPA-PSK
1253527140.795165: WPA: Set own WPA IE default - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
1253527140.795203: Cancelling scan request
1253527140.795221: Trying to authenticate with 00:13:19:80:da:30 (SSID='MNWPA' freq=2412 MHz)
1253527140.795234: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=0 set_tx=0 seq_len=0 key_len=0
1253527140.795286: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=1 set_tx=0 seq_len=0 key_len=0
1253527140.795341: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=2 set_tx=0 seq_len=0 key_len=0
1253527140.795389: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=3 set_tx=0 seq_len=0 key_len=0
1253527140.795435: nl_set_encr: ifindex=34 alg=0 addr=0x91ce96c key_idx=0 set_tx=0 seq_len=0 key_len=0
1253527140.795460:    addr=00:13:19:80:da:30
1253527140.795498: State: COMPLETED -> AUTHENTICATING
1253527140.795515: EAPOL: External notification - EAP success=0
1253527140.795530: EAPOL: External notification - EAP fail=0
1253527140.795542: EAPOL: External notification - portControl=Auto
1253527140.795559: nl80211: Authenticate (ifindex=34)
1253527140.795572:   * bssid=00:13:19:80:da:30
1253527140.795587:   * freq=2412
1253527140.795598:   * SSID - hexdump_ascii(len=5):
     4d 4e 57 50 41                                    MNWPA
1253527140.795630:   * IEs - hexdump(len=0): [NULL]
1253527140.795643:   * Auth Type 0
1253527140.795683: nl80211: Authentication request send successfully
1253527140.795707: RTM_NEWLINK: operstate=1 ifi_flags=0x11043 ([UP][RUNNING][LOWER_UP])
1253527140.795725: RTM_NEWLINK, IFLA_IFNAME: Interface 'eth1' added
1253527140.953161: nl80211: Event message available
1253527140.953200: nl80211: MLME event 37
1253527140.953216: SME: Authentication response: peer=00:13:19:80:da:30 auth_type=0 status_code=0
1253527140.953243: Trying to associate with 00:13:19:80:da:30 (SSID='MNWPA' freq=2412 MHz)
1253527140.953257: State: AUTHENTICATING -> ASSOCIATING
1253527140.953270: wpa_driver_nl80211_set_operstate: operstate 1->0 (DORMANT)
1253527140.953284: nl80211: Operstate: linkmode=-1, operstate=5
1253527140.953901: nl80211: Associate (ifindex=34)
1253527140.953919:   * bssid=00:13:19:80:da:30
1253527140.953934:   * freq=2412
1253527140.953945:   * SSID - hexdump_ascii(len=5):
     4d 4e 57 50 41                                    MNWPA
1253527140.953977:   * IEs - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
1253527140.954064: nl80211: MLME command failed: ret=-114 (Operation already in progress)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1253527140.954086: Association request to the driver failed
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1253527140.954113: RTM_NEWLINK: operstate=0 ifi_flags=0x11003 ([UP][LOWER_UP])
1253527140.954131: RTM_NEWLINK, IFLA_IFNAME: Interface 'eth1' added


The "Association request to the driver failed" will
be shown even without "-d". Also note that the association
seems to actually work, e.g. I can ping throught the
connection.

-- 
http://www.holgerschurig.de

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Mel Gorman @ 2009-09-21 11:02 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
	Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <200909211246.34774.bzolnier@gmail.com>

On Mon, Sep 21, 2009 at 12:46:34PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > <SNIP>
> > > > >
> > > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > > > 
> > > > 
> > > > It's another high-order atomic allocation which is difficult to grant.
> > > > I didn't look closely, but is this the same type of thing - large allocation
> > > > failure during firmware loading? If so, is this during resume or is the
> > > > device being reloaded for some other reason?
> > > 
> > > Just modprobing the driver on a system running for some time.
> > > 
> > 
> > Was this a common situation before?
> 
> Yes, just like firmware restarts with ipw2200.
> 
> > > > I suspect that there are going to be a few of these bugs cropping up
> > > > every so often where network devices are assuming large atomic
> > > > allocations will succeed because the "only time they happen" is during
> > > > boot but these days are happening at runtime for other reasons.
> > > 
> > > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > > 512MB machine with 1024MB swap a bug.  Moreover such failures just never
> > > happened before 2.6.31-rc1.
> > 
> > It's not that normal, it's an allocation that cannot sleep and cannot
> > reclaim. Why is something like firmware loading allocating memory like
> 
> OK.
> 
> > that? Is this use of GFP_ATOMIC relatively recent or has it always been
> > that way?
> 
> It has always been like that.
> 

Nuts, why is firmware loading depending on GFP_ATOMIC?

> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> > > 
> > 
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
> 
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
> 

In that case, how reproducible is this problem so it can be
bisected? Basically, there are no guarantees that GFP_ATOMIC allocations
of this order will succeed although you can improve the odds by increasing
min_free_kbytes. Network drivers should never have been depending on GFP_ATOMIC
succeeding like this but the hole has been dug now.

If it's happening more frequently now than it used to then either

1. The allocations are occuring more frequently where as previously a
   pool might have been reused or the memory not freed for the lifetime of
   the system.

2. Something has changed in the allocator. I'm not aware of recent
   changes that could cause this though in such a recent time-frame.

3. Something has changed recently with respect to reclaim. There have
   been changes made recently to lumpy reclaim and that might be impacting
   kswapd's efforts at keeping large contiguous regions free.

4. Hotplug events that involve driver loads are more common now than they
   were previously for some reason. You mention that this is a USB dongle for
   example. Was it a case before that the driver loaded early and remained
   resident but only active after a hotplug event? If that was the case,
   the memory would be allocated once at boot. However, if an optimisation
   made recently unloads those unused drivers and re-loads them later, there
   would be more order-6 allocations than they were previously and manifest
   as these bug reports. Is this a possibility?

The ideal would be that network drivers not make allocations like this
in the first place by, for example, DMAing the firmware across in
page-size chunks instead of one contiguous lump :/

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* BUG: can bring wpa_supplicant/mac80211 into a stuck state at will
From: Holger Schurig @ 2009-09-21 10:59 UTC (permalink / raw)
  To: hostap; +Cc: linux-wireless

Steps to reproduce:

network={
        ssid="MNHS"
        key_mgmt=NONE
        proto=RSN
        key_mgmt=WPA-PSK
        pairwise=CCMP
        group=CCMP
        psk="99999999"
        scan_freq=2412
}

network={
        ssid="MNFUNK"
        key_mgmt=NONE
        wep_key0="99999"
        scan_freq=2412
}

Also two APs, MNHS/WPA2 and MNFUNK/WEP.

Current wireless-testing from GIT + patch "cfg80211: don't
overwrite privacy setting"

Current wpa_supplicant from GIT.

Command: ./wpa_supplicant -i eth1 -D nl80211 -t -c mnhs.conf -d


First, it associates to the MNHS-AP:

...
1253526128.147011:    selected based on RSN IE
1253526128.147024:    selected WPA AP 00:13:19:80:da:30 ssid='MNHS'
...
1253526128.436265: Associated to a new BSS: BSSID=00:13:19:80:da:30
1253526128.436280: No keys have been configured - skip key clearing
1253526128.436297: Associated with 00:13:19:80:da:30
...
1253526128.470149: CTRL-EVENT-CONNECTED - Connection to 00:13:19:80:da:30 completed (auth) [id=0 id_str=]
...

Fine so far.



Next I turn of the AP where I'm associated to. wpa_supplicant
successfully associates to the WEP-based AP:

...
1253526139.247356: Setting scan request: 0 sec 100000 usec
1253526139.247373: Added BSSID 00:13:19:80:da:30 into blacklist
...
1253526139.727196:    selected non-WPA AP 00:1b:53:11:dc:40 ssid='MNFUNK'
...
1253526140.025300: Associated to a new BSS: BSSID=00:1b:53:11:dc:40
1253526140.025320: Associated with 00:1b:53:11:dc:40
...
1253526140.025654: CTRL-EVENT-CONNECTED - Connection to 00:1b:53:11:dc:40 completed (reauth) [id=1 id_str=]
...



Now I turn on again the MNHS AP. And then I issue a scan
from the cmdline: "iw eth1 scan trigger freq 2412"

And now it's getting interesting:

...
1253526175.739000: Received scan results (3 BSSes)
...

1253526175.739064: 0: 00:13:19:80:da:30 ssid='MNHS' wpa_ie_len=0 rsn_ie_len=20 caps=0x11
1253526175.739087:    selected based on RSN IE
1253526175.739098:    selected WPA AP 00:13:19:80:da:30 ssid='MNHS'
1253526175.739115: Automatic auth_alg selection: 0x1
1253526175.739133: RSN: using IEEE 802.11i/D9.0
1253526175.739144: WPA: Selected cipher suites: group 16 pairwise 16 key_mgmt 2 proto 2
1253526175.739160: WPA: clearing AP WPA IE
1253526175.739171: WPA: set AP RSN IE - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 02 28 00
1253526175.739206: WPA: using GTK CCMP
1253526175.739219: WPA: using PTK CCMP
1253526175.739231: WPA: using KEY_MGMT WPA-PSK
1253526175.739243: WPA: Set own WPA IE default - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 02 00 00
1253526175.739280: Cancelling scan request
1253526175.739298: Trying to authenticate with 00:13:19:80:da:30 (SSID='MNHS' freq=2412 MHz)
1253526175.739312: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=0 set_tx=0 seq_len=0 key_len=0
1253526175.739370: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=1 set_tx=0 seq_len=0 key_len=0
1253526175.739420: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=2 set_tx=0 seq_len=0 key_len=0
1253526175.739466: nl_set_encr: ifindex=34 alg=0 addr=0x8092a8e key_idx=3 set_tx=0 seq_len=0 key_len=0
1253526175.739512: nl_set_encr: ifindex=34 alg=0 addr=0x84e1b34 key_idx=0 set_tx=0 seq_len=0 key_len=0
1253526175.739537:    addr=00:13:19:80:da:30
1253526175.739575: State: COMPLETED -> AUTHENTICATING
1253526175.739593: EAPOL: External notification - EAP success=0
1253526175.739609: EAPOL: External notification - EAP fail=0
1253526175.739622: EAPOL: External notification - portControl=Auto
1253526175.739634: EAPOL: SUPP_PAE entering state DISCONNECTED
1253526175.739646: EAPOL: Supplicant port status: Unauthorized
1253526175.739713: EAPOL: SUPP_BE entering state INITIALIZE
1253526175.739732: EAPOL: SUPP_PAE entering state CONNECTING
1253526175.739744: EAPOL: enable timer tick
1253526175.739757: EAPOL: SUPP_BE entering state IDLE
1253526175.739773: nl80211: Authenticate (ifindex=34)
1253526175.739787:   * bssid=00:13:19:80:da:30
1253526175.739802:   * freq=2412
1253526175.739813:   * SSID - hexdump_ascii(len=4):
     4d 4e 48 53                                       MNHS
1253526175.739845:   * IEs - hexdump(len=0): [NULL]
1253526175.739858:   * Auth Type 0
1253526175.739899: nl80211: Authentication request send successfully
1253526175.739918: RSN: Ignored PMKID candidate without preauth flag
1253526175.739936: RTM_NEWLINK: operstate=1 ifi_flags=0x11043 ([UP][RUNNING][LOWER_UP])
1253526175.739954: RTM_NEWLINK, IFLA_IFNAME: Interface 'eth1' added
1253526175.954553: nl80211: Event message available
1253526175.954592: nl80211: MLME event 37
1253526175.954608: SME: Authentication response: peer=00:13:19:80:da:30 auth_type=0 status_code=0
1253526175.954635: Trying to associate with 00:13:19:80:da:30 (SSID='MNHS' freq=2412 MHz)
1253526175.954649: State: AUTHENTICATING -> ASSOCIATING
1253526175.954662: wpa_driver_nl80211_set_operstate: operstate 1->0 (DORMANT)
1253526175.954677: nl80211: Operstate: linkmode=-1, operstate=5
1253526175.955294: nl80211: Associate (ifindex=34)
1253526175.955312:   * bssid=00:13:19:80:da:30
1253526175.955327:   * freq=2412
1253526175.955338:   * SSID - hexdump_ascii(len=4):
     4d 4e 48 53                                       MNHS
1253526175.955370:   * IEs - hexdump(len=22): 30 14 01 00 00 0f ac 04 01 00 00 0f ac 04 01 00 00 0f ac 02 00 00
1253526175.955456: nl80211: MLME command failed: ret=-114 (Operation already in progress)
1253526175.955479: Association request to the driver failed
1253526175.955512: RTM_NEWLINK: operstate=0 ifi_flags=0x11003 ([UP][LOWER_UP])
1253526175.955530: RTM_NEWLINK, IFLA_IFNAME: Interface 'eth1' added
1253526178.742658: EAPOL: startWhen --> 0
1253526178.742678: EAPOL: disable timer tick
1253526178.742690: EAPOL: SUPP_PAE entering state CONNECTING
1253526178.742701: EAPOL: enable timer tick
1253526178.742714: EAPOL: txStart
1253526178.742725: WPA: drop TX EAPOL in non-IEEE 802.1X mode (type=1 len=0)
1253526208.774009: EAPOL: startWhen --> 0
1253526208.774043: EAPOL: disable timer tick
1253526208.774055: EAPOL: SUPP_PAE entering state CONNECTING
1253526208.774066: EAPOL: enable timer tick
1253526208.774079: EAPOL: txStart
1253526208.774090: WPA: drop TX EAPOL in non-IEEE 802.1X mode (type=1 len=0)
1253526238.805764: EAPOL: startWhen --> 0
1253526238.805790: EAPOL: disable timer tick
1253526238.805802: EAPOL: SUPP_PAE entering state CONNECTING
1253526238.805814: EAPOL: enable timer tick
1253526238.805827: EAPOL: txStart
1253526238.805838: WPA: drop TX EAPOL in non-IEEE 802.1X mode (type=1 len=0)
1253526268.837109: EAPOL: startWhen --> 0
1253526268.837130: EAPOL: disable timer tick
1253526268.837143: EAPOL: SUPP_PAE entering state HELD
1253526268.837154: EAPOL: enable timer tick
1253526268.837166: EAPOL: Supplicant port status: Unauthorized
1253526268.837296: EAPOL authentication completed unsuccessfully
1253526268.837322: Setting authentication timeout: 2 sec 0 usec
1253526270.838400: Authentication with 00:13:19:80:da:30 timed out.
1253526270.838421: BSSID 00:13:19:80:da:30 blacklist count incremented to 2
1253526270.838440: No keys have been configured - skip key clearing
1253526270.838453: State: ASSOCIATING -> DISCONNECTED
1253526270.838466: wpa_driver_nl80211_set_operstate: operstate 0->0 (DORMANT)
1253526270.838481: nl80211: Operstate: linkmode=-1, operstate=5
1253526270.838506: EAPOL: External notification - portEnabled=0
1253526270.838520: EAPOL: SUPP_PAE entering state DISCONNECTED
1253526270.838532: EAPOL: Supplicant port status: Unauthorized
1253526270.838600: EAPOL: SUPP_BE entering state INITIALIZE
1253526270.838619: EAPOL: Supplicant port status: Unauthorized
1253526270.838679: EAPOL: External notification - portValid=0
1253526270.838697: EAPOL: Supplicant port status: Unauthorized
1253526270.838755: EAPOL: External notification - EAP success=0
1253526270.838772: EAPOL: Supplicant port status: Unauthorized
1253526270.838837: Setting scan request: 0 sec 0 usec
1253526270.838867: EAPOL: Supplicant port status: Unauthorized
1253526270.838936: State: DISCONNECTED -> SCANNING
1253526270.838959: Starting AP scan for wildcard SSID
1253526270.839130: Scan requested (ret=0) - scan timeout 30 seconds
1253526270.839165: nl80211: Event message available
1253526270.839193: nl80211: Ignored unknown event (cmd=33)
1253526271.839909: EAPOL: Supplicant port status: Unauthorized
1253526272.840948: EAPOL: Supplicant port status: Unauthorized
1253526273.841988: EAPOL: Supplicant port status: Unauthorized
1253526274.843028: EAPOL: Supplicant port status: Unauthorized
1253526275.844068: EAPOL: Supplicant port status: Unauthorized
1253526276.845110: EAPOL: Supplicant port status: Unauthorized
1253526277.846150: EAPOL: Supplicant port status: Unauthorized
1253526278.847174: EAPOL: Supplicant port status: Unauthorized
1253526279.848214: EAPOL: Supplicant port status: Unauthorized
1253526280.849255: EAPOL: Supplicant port status: Unauthorized
1253526281.850294: EAPOL: Supplicant port status: Unauthorized
1253526282.851334: EAPOL: Supplicant port status: Unauthorized
1253526283.852374: EAPOL: Supplicant port status: Unauthorized
1253526284.853414: EAPOL: Supplicant port status: Unauthorized
1253526285.854454: EAPOL: Supplicant port status: Unauthorized
1253526286.855495: EAPOL: Supplicant port status: Unauthorized
1253526287.856534: EAPOL: Supplicant port status: Unauthorized
1253526288.857574: EAPOL: Supplicant port status: Unauthorized
1253526289.858613: EAPOL: Supplicant port status: Unauthorized
1253526290.859653: EAPOL: Supplicant port status: Unauthorized
1253526291.860693: EAPOL: Supplicant port status: Unauthorized
1253526292.861733: EAPOL: Supplicant port status: Unauthorized
1253526293.862773: EAPOL: Supplicant port status: Unauthorized
1253526294.863813: EAPOL: Supplicant port status: Unauthorized
1253526295.864853: EAPOL: Supplicant port status: Unauthorized

... AD INIFINITUM ...

-- 
http://www.holgerschurig.de

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Pekka Enberg @ 2009-09-21 10:56 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
	Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <200909211246.34774.bzolnier@gmail.com>

On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> > 
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
> 
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).

Yeah, I wonder what broke things. Did the wireless stack change in
2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.

			Pekka


^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 10:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
	Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <20090921100813.GL12726@csn.ul.ie>

On Monday 21 September 2009 12:08:13 Mel Gorman wrote:
> On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > > >>
> > > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > > >>
> > > > > > >> The issue has always been there but it was some recent change that
> > > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > > >
> > > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > > > 
> > > > > > OK so the mount succeeded.
> > > > > > 
> > > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > > EXT4-fs: file extents enabled
> > > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > > Call Trace:
> > > > > > >  [<c0394de3>] ? printk+0xf/0x14
> > > > > > >  [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > > >  [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > > >  [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > > >  [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > > >  [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > > >  [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > > >  [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > > >  [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > > >  [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > > >  [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > > >  [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > > >  [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > > >  [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > > >  [<c019d0af>] do_mount+0x660/0x6b8
> > > > > > >  [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > > >  [<c019d168>] sys_mount+0x61/0x99
> > > > > > >  [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > > Mem-Info:
> > > > > > > DMA per-cpu:
> > > > > > > CPU    0: hi:    0, btch:   1 usd:   0
> > > > > > > Normal per-cpu:
> > > > > > > CPU    0: hi:  186, btch:  31 usd:   0
> > > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > > >  inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > > >  free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 489 489
> > > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 0 0
> > > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > > 57947 total pagecache pages
> > > > > > > 878 pages in swap cache
> > > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > > Free swap  = 1016436kB
> > > > > > > Total swap = 1020116kB
> > > > > > > 131056 pages RAM
> > > > > > > 4233 pages reserved
> > > > > > > 90573 pages shared
> > > > > > > 77286 pages non-shared
> > > > > > > EXT4-fs: mballoc enabled
> > > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > > >
> > > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > > debug the problem further are appreciated..
> > > > > > >
> > > > > > > The complete dmesg and kernel config are here:
> > > > > > >
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > > > 
> > > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > > non-criticalness this issue has been deemed.
> > > > > > 
> > > > > > A patch fix is part of the ext4-patchqueue
> > > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > > > 
> > > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > > > 
> > > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > > > 
> > > > > is a different problem (unrelated to this one).
> > > > 
> > > > Here is another data point.
> > > > 
> > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > > 
> > > 
> > > It's another high-order atomic allocation which is difficult to grant.
> > > I didn't look closely, but is this the same type of thing - large allocation
> > > failure during firmware loading? If so, is this during resume or is the
> > > device being reloaded for some other reason?
> > 
> > Just modprobing the driver on a system running for some time.
> > 
> 
> Was this a common situation before?

Yes, just like firmware restarts with ipw2200.

> > > I suspect that there are going to be a few of these bugs cropping up
> > > every so often where network devices are assuming large atomic
> > > allocations will succeed because the "only time they happen" is during
> > > boot but these days are happening at runtime for other reasons.
> > 
> > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > 512MB machine with 1024MB swap a bug.  Moreover such failures just never
> > happened before 2.6.31-rc1.
> 
> It's not that normal, it's an allocation that cannot sleep and cannot
> reclaim. Why is something like firmware loading allocating memory like

OK.

> that? Is this use of GFP_ATOMIC relatively recent or has it always been
> that way?

It has always been like that.

> > I don't know why people don't see it but for me it has a memory management
> > regression and reliability issue written all over it.
> > 
> 
> Possibly but drivers that reload their firmware as a response to an
> error condition is relatively new and loading network drivers while the
> system is already up and running a long time does not strike me as
> typical system behaviour.

Loading drivers after boot is a typical desktop/laptop behavior, please
think about hotplug (the hardware in question is an USB dongle).

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Mel Gorman @ 2009-09-21 10:08 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
	Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <200909211159.27344.bzolnier@gmail.com>

On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > >>
> > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > >>
> > > > > >> The issue has always been there but it was some recent change that
> > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > >
> > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > > 
> > > > > OK so the mount succeeded.
> > > > > 
> > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > EXT4-fs: file extents enabled
> > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > Call Trace:
> > > > > >  [<c0394de3>] ? printk+0xf/0x14
> > > > > >  [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > >  [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > >  [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > >  [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > >  [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > >  [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > >  [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > >  [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > >  [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > >  [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > >  [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > >  [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > >  [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > >  [<c019d0af>] do_mount+0x660/0x6b8
> > > > > >  [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > >  [<c019d168>] sys_mount+0x61/0x99
> > > > > >  [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > Mem-Info:
> > > > > > DMA per-cpu:
> > > > > > CPU    0: hi:    0, btch:   1 usd:   0
> > > > > > Normal per-cpu:
> > > > > > CPU    0: hi:  186, btch:  31 usd:   0
> > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > >  inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > >  free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 489 489
> > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 0 0
> > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > 57947 total pagecache pages
> > > > > > 878 pages in swap cache
> > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > Free swap  = 1016436kB
> > > > > > Total swap = 1020116kB
> > > > > > 131056 pages RAM
> > > > > > 4233 pages reserved
> > > > > > 90573 pages shared
> > > > > > 77286 pages non-shared
> > > > > > EXT4-fs: mballoc enabled
> > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > >
> > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > debug the problem further are appreciated..
> > > > > >
> > > > > > The complete dmesg and kernel config are here:
> > > > > >
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > > 
> > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > non-criticalness this issue has been deemed.
> > > > > 
> > > > > A patch fix is part of the ext4-patchqueue
> > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > > 
> > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > > 
> > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > > 
> > > > is a different problem (unrelated to this one).
> > > 
> > > Here is another data point.
> > > 
> > > This time it is an order-6 page allocation failure for rt2870sta
> > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > 
> > 
> > It's another high-order atomic allocation which is difficult to grant.
> > I didn't look closely, but is this the same type of thing - large allocation
> > failure during firmware loading? If so, is this during resume or is the
> > device being reloaded for some other reason?
> 
> Just modprobing the driver on a system running for some time.
> 

Was this a common situation before?

> > I suspect that there are going to be a few of these bugs cropping up
> > every so often where network devices are assuming large atomic
> > allocations will succeed because the "only time they happen" is during
> > boot but these days are happening at runtime for other reasons.
> 
> I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> 512MB machine with 1024MB swap a bug.  Moreover such failures just never
> happened before 2.6.31-rc1.

It's not that normal, it's an allocation that cannot sleep and cannot
reclaim. Why is something like firmware loading allocating memory like
that? Is this use of GFP_ATOMIC relatively recent or has it always been
that way?

> I don't know why people don't see it but for me it has a memory management
> regression and reliability issue written all over it.
> 

Possibly but drivers that reload their firmware as a response to an
error condition is relatively new and loading network drivers while the
system is already up and running a long time does not strike me as
typical system behaviour.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Bartlomiej Zolnierkiewicz @ 2009-09-21  9:59 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
	Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <20090921085843.GF12726@csn.ul.ie>

On Monday 21 September 2009 10:58:44 Mel Gorman wrote:
> On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > >>
> > > > >> s/2.6.30/2.6.31-rc6/
> > > > >>
> > > > >> The issue has always been there but it was some recent change that
> > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > >
> > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > 
> > > > OK so the mount succeeded.
> > > > 
> > > > > EXT4-fs (dm-2): barriers enabled
> > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > EXT4-fs: file extents enabled
> > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > Call Trace:
> > > > >  [<c0394de3>] ? printk+0xf/0x14
> > > > >  [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > >  [<c016a71b>] __get_free_pages+0xf/0x32
> > > > >  [<c01865cf>] __kmalloc+0x28/0xfa
> > > > >  [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > >  [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > >  [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > >  [<c0239bc8>] ? snprintf+0x15/0x17
> > > > >  [<c01c0b26>] ? disk_name+0x24/0x69
> > > > >  [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > >  [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > >  [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > >  [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > >  [<c018adad>] do_kern_mount+0x33/0xbd
> > > > >  [<c019d0af>] do_mount+0x660/0x6b8
> > > > >  [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > >  [<c019d168>] sys_mount+0x61/0x99
> > > > >  [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > Mem-Info:
> > > > > DMA per-cpu:
> > > > > CPU    0: hi:    0, btch:   1 usd:   0
> > > > > Normal per-cpu:
> > > > > CPU    0: hi:  186, btch:  31 usd:   0
> > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > >  inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > >  free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 489 489
> > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 0 0
> > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > 57947 total pagecache pages
> > > > > 878 pages in swap cache
> > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > Free swap  = 1016436kB
> > > > > Total swap = 1020116kB
> > > > > 131056 pages RAM
> > > > > 4233 pages reserved
> > > > > 90573 pages shared
> > > > > 77286 pages non-shared
> > > > > EXT4-fs: mballoc enabled
> > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > >
> > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > debug the problem further are appreciated..
> > > > >
> > > > > The complete dmesg and kernel config are here:
> > > > >
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > 
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> > > > 
> > > > A patch fix is part of the ext4-patchqueue
> > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > 
> > > Thanks for the pointer but the page allocation failures that I hit seem
> > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > 
> > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > 
> > > is a different problem (unrelated to this one).
> > 
> > Here is another data point.
> > 
> > This time it is an order-6 page allocation failure for rt2870sta
> > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > 
> 
> It's another high-order atomic allocation which is difficult to grant.
> I didn't look closely, but is this the same type of thing - large allocation
> failure during firmware loading? If so, is this during resume or is the
> device being reloaded for some other reason?

Just modprobing the driver on a system running for some time.

> I suspect that there are going to be a few of these bugs cropping up
> every so often where network devices are assuming large atomic
> allocations will succeed because the "only time they happen" is during
> boot but these days are happening at runtime for other reasons.

I wouldn't go so far as calling a normal order-6 (256kB) allocation on
512MB machine with 1024MB swap a bug.  Moreover such failures just never
happened before 2.6.31-rc1.

I don't know why people don't see it but for me it has a memory management
regression and reliability issue written all over it.

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Mel Gorman @ 2009-09-21  8:58 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
	Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <200909191525.33297.bzolnier@gmail.com>

On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > >>
> > > >> s/2.6.30/2.6.31-rc6/
> > > >>
> > > >> The issue has always been there but it was some recent change that
> > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > >
> > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > 
> > > OK so the mount succeeded.
> > > 
> > > > EXT4-fs (dm-2): barriers enabled
> > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > EXT4-fs: file extents enabled
> > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > Call Trace:
> > > >  [<c0394de3>] ? printk+0xf/0x14
> > > >  [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > >  [<c016a71b>] __get_free_pages+0xf/0x32
> > > >  [<c01865cf>] __kmalloc+0x28/0xfa
> > > >  [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > >  [<c01f529d>] ext4_mb_init+0x392/0x460
> > > >  [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > >  [<c0239bc8>] ? snprintf+0x15/0x17
> > > >  [<c01c0b26>] ? disk_name+0x24/0x69
> > > >  [<c018ba63>] get_sb_bdev+0xda/0x117
> > > >  [<c01e6711>] ext4_get_sb+0x13/0x15
> > > >  [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > >  [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > >  [<c018adad>] do_kern_mount+0x33/0xbd
> > > >  [<c019d0af>] do_mount+0x660/0x6b8
> > > >  [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > >  [<c019d168>] sys_mount+0x61/0x99
> > > >  [<c0102908>] sysenter_do_call+0x12/0x36
> > > > Mem-Info:
> > > > DMA per-cpu:
> > > > CPU    0: hi:    0, btch:   1 usd:   0
> > > > Normal per-cpu:
> > > > CPU    0: hi:  186, btch:  31 usd:   0
> > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > >  inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > >  free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 489 489
> > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 0 0
> > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > 57947 total pagecache pages
> > > > 878 pages in swap cache
> > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > Free swap  = 1016436kB
> > > > Total swap = 1020116kB
> > > > 131056 pages RAM
> > > > 4233 pages reserved
> > > > 90573 pages shared
> > > > 77286 pages non-shared
> > > > EXT4-fs: mballoc enabled
> > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > >
> > > > Thus it seems like the original bug is still there and any ideas how to
> > > > debug the problem further are appreciated..
> > > >
> > > > The complete dmesg and kernel config are here:
> > > >
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > 
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
> > > 
> > > A patch fix is part of the ext4-patchqueue
> > > http://repo.or.cz/w/ext4-patch-queue.git
> > 
> > Thanks for the pointer but the page allocation failures that I hit seem
> > to be caused by the memory management itself and the ext4 issue fixed by:
> > 
> > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > 
> > is a different problem (unrelated to this one).
> 
> Here is another data point.
> 
> This time it is an order-6 page allocation failure for rt2870sta
> (w/ upcoming driver changes) and Linus' tree from few days ago..
> 

It's another high-order atomic allocation which is difficult to grant.
I didn't look closely, but is this the same type of thing - large allocation
failure during firmware loading? If so, is this during resume or is the
device being reloaded for some other reason?

I suspect that there are going to be a few of these bugs cropping up
every so often where network devices are assuming large atomic
allocations will succeed because the "only time they happen" is during
boot but these days are happening at runtime for other reasons.

> ifconfig: page allocation failure. order:6, mode:0x8020
> Pid: 4752, comm: ifconfig Tainted: G        WC 2.6.31-04082-g1824090-dirty #80
> Call Trace:
>  [<c03996f2>] ? printk+0xf/0x15
>  [<c016b841>] __alloc_pages_nodemask+0x41d/0x462
>  [<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
>  [<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
>  [<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
>  [<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
>  [<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
>  [<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
>  [<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
>  [<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
>  [<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
>  [<c03383a4>] dev_open+0x86/0xbb
>  [<c0337b1a>] dev_change_flags+0x96/0x147
>  [<c036e9cb>] devinet_ioctl+0x20f/0x4f8
>  [<c036fc8f>] inet_ioctl+0x8e/0xa7
>  [<c032ab50>] sock_ioctl+0x1c9/0x1ed
>  [<c032a987>] ? sock_ioctl+0x0/0x1ed
>  [<c0195732>] vfs_ioctl+0x18/0x71
>  [<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
>  [<c01779d6>] ? handle_mm_fault+0x242/0x4ff
>  [<c0119609>] ? do_page_fault+0x102/0x292
>  [<c0140721>] ? up_read+0x16/0x29
>  [<c0195d27>] sys_ioctl+0x2e/0x48
>  [<c0102908>] sysenter_do_call+0x12/0x36
> Mem-Info:
> DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> Normal per-cpu:
> CPU    0: hi:  186, btch:  31 usd:  84
> Active_anon:14664 active_file:30057 inactive_anon:31744
>  inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
>  free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
> DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 489 489
> Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
> Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
> 64568 total pagecache pages
> 3652 pages in swap cache
> Swap cache stats: add 21642, delete 17990, find 4906/6079
> Free swap  = 981700kB
> Total swap = 1020116kB
> 131056 pages RAM
> 4262 pages reserved
> 91941 pages shared
> 60834 pages non-shared
> <-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !! 
> <-- RTMPAllocTxRxRingMemory, Status=3
> ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
> !!! rt28xx Initialized fail !!!
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply

* Re: [RFC] nl80211: introduce NL80211_ATTR_SCAN_EXPIRE
From: Holger Schurig @ 2009-09-21  7:46 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org, John W Linville
In-Reply-To: <1253517333.4458.7.camel@johannes.local>

> It's not actually the same, and you didn't explain that well.
> You care about the disappear case, but you made it sound like
> you cared about the _reappear_ case.

Okay, sorry. I thought you read my other mails about 
scan-life-time and that thus the context was/is clear.

The base issue is: SCAN_TRIGGER does normally not a "clean scan", 
it normally adds entries to the BSS list or updates existing 
ones.

Entries in the BSS list are only deleted after 15 seconds.

However, in 15 seconds you can easily leave the range of AP_OLD 
and be in the range of a completely AP_NEW. But it can also be 
the case that the (now stale!) signal of AP_OLD is higher than 
the (real) signal of AP_NEW. In this case wpa_supplicant tries 
to associate to AP_OLD, which is out-of-sight. And that takes 
unneeded time.

I simple tried to mimick this scenario in the office, by 
switching off an AP (just that I didn't really switch it off, 
because the boot-times of Cisco-APs are soooooooo sloooooow).

About your "It's not actually the same": I think that my 
laboratory experiment very well shows this behavior, e.g. see my 
2nd message with subject "Life-time of scan-results?":

1253275108.958746: Trying to authenticate with 00:13:19:80:da:30 
(SSID='MNHS' freq=2412 MHz)

Bit this is the "vanished" AP_OLD. With wireshark on a second 
WLAN card I saw the attemps of mac80211 to associate to this 
now-out-of-sight AP. This takes some tries from mac80211, then a 
timeout on wpa_supplicant, than a new scan, then a new attempt. 
All of those delays completely unnecessary if there would have 
been a way to not get stale data via SCAN_DUMP.

So, clearly I have a visible problem and need to fix that.

I could fix that by making SCAN_TRIGGER always delete all stale 
(cached) entries. Then I wouldn't need NL80211_ATTR_SCAN_EXPIRE.

However, a scan because "I want to look what is around" might be 
different to a scan because "I need fresh data of APs around for 
associating". And so I thought I make that configurable.

I hope this now makes more sense.

^ permalink raw reply

* Re: [RFC] nl80211: introduce NL80211_ATTR_SCAN_EXPIRE
From: Johannes Berg @ 2009-09-21  7:15 UTC (permalink / raw)
  To: Holger Schurig; +Cc: linux-wireless@vger.kernel.org, John W Linville
In-Reply-To: <200909210842.59852.hs4233@mail.mn-solutions.de>

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

On Mon, 2009-09-21 at 08:42 +0200, Holger Schurig wrote:
> > > It also allows to set an expiration for
> > > NL80211_CMD_TRIGGER_SCAN. Setting the expiration to 0 will
> > > clean the whole BSS list.
> >
> > tbh, I don't really understand the need for it. changing your
> > AP settings doesn't seem like a use case anyone would really
> > care about that much.
> 
> Turning of an AP is one way making the AP vanish.
> 
> Sitting into a fork-lift and driving throught a ware-house hall 
> is another way to accomplish the same.

It's not actually the same, and you didn't explain that well. You care
about the disappear case, but you made it sound like you cared about the
_reappear_ case.

However, I still don't believe that you need to change anything in the
kernel, just get wpa_supplicant to handle it properly.

johannes


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [RFC] nl80211: introduce NL80211_ATTR_SCAN_EXPIRE
From: Holger Schurig @ 2009-09-21  7:11 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org; +Cc: Johannes Berg
In-Reply-To: <200909181849.22302.hs4233@mail.mn-solutions.de>

Hmm, I think I need some more check:

NL80211_CMD_SCAN_TRIGGER is a root-only operation
NL80211_CMD_GET_SCAN is an any-user operation.

I think that a non-root user shouldn't be allowed to specify 
NL80211_ATTR_SCAN_EXPIRE that is smaller than 
IEEE80211_SCAN_RESULT_EXPIRE, right?

-- 
http://www.holgerschurig.de

^ permalink raw reply

* getting "ath5k phy0: noise floor calibration timeout"
From: Holger Schurig @ 2009-09-21  7:02 UTC (permalink / raw)
  To: linux-wireless; +Cc: Luis R. Rodriguez

I kept my laptop the week-end on. The devices was unused during
that time. I got a bunch output into my syslog:

Sep 18 17:30:03 laptop kernel: ##HS cfg80211_bss_expire:129 0
Sep 18 17:30:03 laptop kernel: ##HS expire 00:1b:d4:44:35:90
Sep 18 17:30:03 laptop kernel: ##HS expire 00:13:19:80:da:30
Sep 18 17:30:03 laptop kernel: ##HS expire 00:1b:53:11:dc:40
Sep 18 18:15:04 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 18 18:26:12 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 11:12:09 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 11:29:36 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 11:47:57 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 12:31:05 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 13:06:42 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 14:07:51 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 14:10:31 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 19 14:12:00 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 20 00:06:20 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 20 07:46:24 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 20 16:36:43 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 20 16:47:43 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 20 16:52:43 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 20 17:01:43 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 20 20:07:57 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 06:22:55 laptop last message repeated 2 times
Sep 21 06:26:00 laptop last message repeated 2 times
Sep 21 06:28:26 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 06:33:14 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 06:38:34 laptop last message repeated 2 times
Sep 21 06:40:21 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 06:41:40 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 06:43:44 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 06:55:17 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 07:03:10 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 07:04:35 laptop last message repeated 2 times
Sep 21 07:06:11 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 07:16:58 laptop kernel: ath5k phy0: noise floor calibration timeout (2412MHz)
Sep 21 07:21:37 laptop last message repeated 2 times

Is this anything where I should care?  Or can I just
ignore that?  If the latter is the case, should I add a
patch that only three consecutive calibration timeouts
produce a message, so that the log doesn't get "spammed" ?

-- 
http://www.holgerschurig.de

^ permalink raw reply

* Re: Problems with "cfg80211: fix SME connect" commit
From: Holger Schurig @ 2009-09-21  6:45 UTC (permalink / raw)
  To: Albert Herranz; +Cc: johannes, linville, linux-wireless
In-Reply-To: <505407.87957.qm@web28315.mail.ukl.yahoo.com>

Can you try "[PATCH] cfg80211: don't overwrite privacy setting" 
from [1]?

[1] http://marc.info/?l=linux-wireless&m=125323296617306&w=2

And can you use the mailing list archives?  This is now the third 
time this poppep up, I just copied Sedat's mail for you.

-- 
http://www.holgerschurig.de

^ permalink raw reply

* Re: [RFC] nl80211: introduce NL80211_ATTR_SCAN_EXPIRE
From: Holger Schurig @ 2009-09-21  6:42 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org, John W Linville
In-Reply-To: <1253293842.4787.1.camel@johannes.local>

> > It also allows to set an expiration for
> > NL80211_CMD_TRIGGER_SCAN. Setting the expiration to 0 will
> > clean the whole BSS list.
>
> tbh, I don't really understand the need for it. changing your
> AP settings doesn't seem like a use case anyone would really
> care about that much.

Turning of an AP is one way making the AP vanish.

Sitting into a fork-lift and driving throught a ware-house hall 
is another way to accomplish the same.

However, it's way easier for me to change the AP's setting than 
to actually leave me office :-)


/me wishes kernel developers would have more sense for real 
life :-)

-- 
http://www.holgerschurig.de

^ permalink raw reply

* Re: ipw2200: firmware DMA loading rework
From: Bartlomiej Zolnierkiewicz @ 2009-09-19 13:25 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
	Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
	netdev@vger.kernel.org, linux-mm@kvack.org, James Ketrenos,
	Chatre, Reinette, linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <200909022026.17910.bzolnier@gmail.com>

On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > >>
> > >> s/2.6.30/2.6.31-rc6/
> > >>
> > >> The issue has always been there but it was some recent change that
> > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > >
> > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > 
> > OK so the mount succeeded.
> > 
> > > EXT4-fs (dm-2): barriers enabled
> > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > EXT4-fs (dm-2): delayed allocation enabled
> > > EXT4-fs: file extents enabled
> > > mount: page allocation failure. order:5, mode:0xc0d0
> > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > Call Trace:
> > >  [<c0394de3>] ? printk+0xf/0x14
> > >  [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > >  [<c016a71b>] __get_free_pages+0xf/0x32
> > >  [<c01865cf>] __kmalloc+0x28/0xfa
> > >  [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > >  [<c01f529d>] ext4_mb_init+0x392/0x460
> > >  [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > >  [<c0239bc8>] ? snprintf+0x15/0x17
> > >  [<c01c0b26>] ? disk_name+0x24/0x69
> > >  [<c018ba63>] get_sb_bdev+0xda/0x117
> > >  [<c01e6711>] ext4_get_sb+0x13/0x15
> > >  [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > >  [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > >  [<c018adad>] do_kern_mount+0x33/0xbd
> > >  [<c019d0af>] do_mount+0x660/0x6b8
> > >  [<c016a71b>] ? __get_free_pages+0xf/0x32
> > >  [<c019d168>] sys_mount+0x61/0x99
> > >  [<c0102908>] sysenter_do_call+0x12/0x36
> > > Mem-Info:
> > > DMA per-cpu:
> > > CPU    0: hi:    0, btch:   1 usd:   0
> > > Normal per-cpu:
> > > CPU    0: hi:  186, btch:  31 usd:   0
> > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > >  inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > >  free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 489 489
> > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 0 0
> > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > 57947 total pagecache pages
> > > 878 pages in swap cache
> > > Swap cache stats: add 920, delete 42, find 11/11
> > > Free swap  = 1016436kB
> > > Total swap = 1020116kB
> > > 131056 pages RAM
> > > 4233 pages reserved
> > > 90573 pages shared
> > > 77286 pages non-shared
> > > EXT4-fs: mballoc enabled
> > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > >
> > > Thus it seems like the original bug is still there and any ideas how to
> > > debug the problem further are appreciated..
> > >
> > > The complete dmesg and kernel config are here:
> > >
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > 
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
> > 
> > A patch fix is part of the ext4-patchqueue
> > http://repo.or.cz/w/ext4-patch-queue.git
> 
> Thanks for the pointer but the page allocation failures that I hit seem
> to be caused by the memory management itself and the ext4 issue fixed by:
> 
> http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> 
> is a different problem (unrelated to this one).

Here is another data point.

This time it is an order-6 page allocation failure for rt2870sta
(w/ upcoming driver changes) and Linus' tree from few days ago..

ifconfig: page allocation failure. order:6, mode:0x8020
Pid: 4752, comm: ifconfig Tainted: G        WC 2.6.31-04082-g1824090-dirty #80
Call Trace:
 [<c03996f2>] ? printk+0xf/0x15
 [<c016b841>] __alloc_pages_nodemask+0x41d/0x462
 [<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
 [<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
 [<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
 [<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
 [<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
 [<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
 [<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
 [<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
 [<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
 [<c03383a4>] dev_open+0x86/0xbb
 [<c0337b1a>] dev_change_flags+0x96/0x147
 [<c036e9cb>] devinet_ioctl+0x20f/0x4f8
 [<c036fc8f>] inet_ioctl+0x8e/0xa7
 [<c032ab50>] sock_ioctl+0x1c9/0x1ed
 [<c032a987>] ? sock_ioctl+0x0/0x1ed
 [<c0195732>] vfs_ioctl+0x18/0x71
 [<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
 [<c01779d6>] ? handle_mm_fault+0x242/0x4ff
 [<c0119609>] ? do_page_fault+0x102/0x292
 [<c0140721>] ? up_read+0x16/0x29
 [<c0195d27>] sys_ioctl+0x2e/0x48
 [<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:  84
Active_anon:14664 active_file:30057 inactive_anon:31744
 inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
 free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
64568 total pagecache pages
3652 pages in swap cache
Swap cache stats: add 21642, delete 17990, find 4906/6079
Free swap  = 981700kB
Total swap = 1020116kB
131056 pages RAM
4262 pages reserved
91941 pages shared
60834 pages non-shared
<-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !! 
<-- RTMPAllocTxRxRingMemory, Status=3
ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
!!! rt28xx Initialized fail !!!

^ permalink raw reply

* Re: [PATCH 1/3] libertas: support boot commands to write persistent firmware and bootloader
From: David Woodhouse @ 2009-09-20 18:25 UTC (permalink / raw)
  To: Brian Cavagnolo, dcbw, Andrey Yurovsky
  Cc: linville, linux-wireless, libertas-dev
In-Reply-To: <487f84aa.1f068e0a.7ee7.3360@mx.google.com>

On Tue, 2008-07-15 at 17:45 -0700, Brian Cavagnolo wrote:
> Add locking and non-locking versions of if_usb_prog_firmware to support
> programming firmware after and before driver bring-up respectively.
> Add more suitable error codes for firmware programming process.
> Add capability checks for persistent features before attempting to use them.
> 
> Based on patches from Brajesh Dave and Priyank Singh.
> 
> Signed-off-by: Brian Cavagnolo <brian@cozybit.com>

Grrrr. We already ripped this _out_ of the driver once, and Dan
implemented a tool to do it in userspace where it belongs.

What was wrong with using the tool at git.infradead.org/libertas-flash? 

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply

* Problems with "cfg80211: fix SME connect" commit
From: Albert Herranz @ 2009-09-20 18:03 UTC (permalink / raw)
  To: johannes, linville; +Cc: linux-wireless

Hello,

Patch "cfg80211: fix SME connect" breaks my setup (WPA2/AES-CCMP between my b43 card and my AP).

When the patch is applied, the usual kernel messages "direct probe" and "authenticate" never happen, and IP address adquisition via dhcp fails:

[   12.709769] b43-phy0 debug: Chip initialized
[   12.725621] b43-phy0 debug: PIO initialized
[   12.733015] b43-phy0 debug: QoS disabled
[   12.913516] b43-phy0 debug: Wireless interface started
[   13.021170] b43-phy0 debug: Adding Interface type 2

When the patch is reverted everything works again:

[   13.222940] b43-phy0 debug: Chip initialized
[   13.291359] b43-phy0 debug: PIO initialized
[   13.330116] b43-phy0 debug: QoS disabled
[   13.861636] b43-phy0 debug: Wireless interface started
[   13.969153] b43-phy0 debug: Adding Interface type 2
[   16.679249] wlan1: direct probe to AP 00:12:17:15:e7:79 (try 1)
[   16.700998] wlan1 direct probe responded
[   16.707013] wlan1: authenticate with AP 00:12:17:15:e7:79 (try 1)
[   16.720205] wlan1: authenticated
[   16.726261] wlan1: associate with AP 00:12:17:15:e7:79 (try 1)
[   16.740697] wlan1: RX AssocResp from 00:12:17:15:e7:79 (capab=0x431 status=0 aid=1)
[   16.758042] wlan1: associated

Any clues?

Thanks,
Albert



      

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox