Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PULL] vhost: tcm_vhost fixes for 3.9
From: David Miller @ 2013-03-18 20:05 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, asias, nab,
	target-devel
In-Reply-To: <20130318195443.GB19761@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 18 Mar 2013 21:54:43 +0200

> On Mon, Mar 18, 2013 at 01:01:36PM -0400, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Mon, 18 Mar 2013 13:20:03 +0200
>> 
>> > The following changes since commit 8c6216d7f118a128678270824b6a1286a63863ca:
>> > 
>> >   Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally" (2013-03-16 23:00:41 -0400)
>> > 
>> > are available in the git repository at:
>> > 
>> >   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
>> > 
>> > for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:
>> > 
>> >   tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 13:04:14 +0200)
>> 
>> This is a scsi driver, I therefore don't think this pull request is for me.
>> 
>> Please avoid such confusion in the future, and don't use branch names
>> like "vhost-net" for SCSI driver fixes.
>> 
>> Thanks.
> 
> We are sharing quite a bit of code, so it's a common tree.
> There was a vhost-net patch there too but you picked it up
> directly so I dropped that and what's left is vhost-scsi.
> Typically vhost-net dominates so I'm just used to merging all vhost
> things through you but if you think it's wrong I can send it directly to
> Linus.

Just like any other "bus" or "layer" for drivers, you should
really just send the networking bits my way.

Now, in situations where some generic vhost infrastructure is
necessary for a vhost-net specific change, that might be a
situation where we can merge the infrastructure change into
my tree so that the networking bits can go in too.

^ permalink raw reply

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
From: David Miller @ 2013-03-18 20:05 UTC (permalink / raw)
  To: zenczykowski; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin
In-Reply-To: <CAHo-OoxaaLLXtAZcx99m7p5=Hez3gpUWxvoQAgksyG3PRZH-Ng@mail.gmail.com>

From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 18 Mar 2013 12:54:49 -0700

> I've reproduced this in the lab.
> 
> I can confirm this patch fixes per-queue statistics (ie. ethtool -S
> "[0]: tx_bytes") - which previously were not monotonically increasing.
> However there is still a bug wrt. global device stats (ie. ethtool -S
> "tx_bytes" and /proc/net/dev transmit bytes).
> I'm guessing there's another similar bug later on during aggregation.
> 
> I expect to find the offending code and send out a patch soon.

Thanks Maciej.

^ permalink raw reply

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
From: Maciej Żenczykowski @ 2013-03-18 20:06 UTC (permalink / raw)
  To: David Miller; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin
In-Reply-To: <CAHo-OoxaaLLXtAZcx99m7p5=Hez3gpUWxvoQAgksyG3PRZH-Ng@mail.gmail.com>

Current best guess is UPDATE_ESTAT_QSTAT_64 which uses SUB_64
implemented via DIFF_64 which is crazy.

On Mon, Mar 18, 2013 at 12:54 PM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> I've reproduced this in the lab.
>
> I can confirm this patch fixes per-queue statistics (ie. ethtool -S
> "[0]: tx_bytes") - which previously were not monotonically increasing.
> However there is still a bug wrt. global device stats (ie. ethtool -S
> "tx_bytes" and /proc/net/dev transmit bytes).
> I'm guessing there's another similar bug later on during aggregation.
>
> I expect to find the offending code and send out a patch soon.
>
> On Mon, Mar 18, 2013 at 10:13 AM, David Miller <davem@davemloft.net> wrote:
>> From: "Eilon Greenstein" <eilong@broadcom.com>
>> Date: Mon, 18 Mar 2013 12:06:22 +0200
>>
>>> Maciej - thanks for the detailed information. You are right - it has
>>> nothing to do with the HW/FW and it is simply a bug that needs to be
>>> fixed. I withdraw my objections and add my ACK.
>>>
>>> Acked-by: Eilon Greenstein <eilong@broadcom.com>
>>
>> Applied and queued up for -stable.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: sctp: hang in sctp_remaddr_seq_show
From: Neil Horman @ 2013-03-18 20:32 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Eric Dumazet, Sasha Levin, sri, David S. Miller, linux-sctp,
	netdev, Dave Jones, linux-kernel@vger.kernel.org
In-Reply-To: <5147333A.3010907@gmail.com>

On Mon, Mar 18, 2013 at 11:31:06AM -0400, Vlad Yasevich wrote:
> On 03/18/2013 11:25 AM, Eric Dumazet wrote:
> >On Mon, 2013-03-18 at 07:04 -0400, Neil Horman wrote:
> >
> >>I'm not sure why the process would never get back to the schedule, but looking
> >>at the sctp_remaddr_seq_show function, I think that we should convert this
> >>sequence:
> >>	sctp_local_bh_disable();
> >>	read_lock(&head->lock);
> >>	rcu_read_lock();
> >>
> >>to this:
> >>	read_lock(&head->lock);
> >>	rcu_read_lock_bh();
> >>
> >>Neil
> >
> >I dont think so.
> >
> >BH needs to be disabled before read_lock(&head->lock);
> >
> >or else, write_lock() could deadlock (assuming it can be called from BH)
> >
> >
> 
> If anything, this should probably be done like this:
> 
> rcu_read_lock();
> read_lock_bh(&head->lock)
> ...
> 
> read_unlock_bh(&head->lock)
> rcu_read_unlock();
> 
Vlads, right.  We need to grab the rcu lock before the read lock, but we should
probably use the rcu_read_lock_bh variant, since we're going to disable bottom
halves anyway.
Neil

> -vlad
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
From: Eric Dumazet @ 2013-03-18 20:35 UTC (permalink / raw)
  To: Maciej Żenczykowski; +Cc: David Miller, eilong, dmitry, netdev, yuvalmin
In-Reply-To: <CAHo-OowoWD_BT2W8ok32pcSmcshXttpErna-X92Ojz_A8es8iw@mail.gmail.com>

On Mon, 2013-03-18 at 13:06 -0700, Maciej Żenczykowski wrote:
> Current best guess is UPDATE_ESTAT_QSTAT_64 which uses SUB_64
> implemented via DIFF_64 which is crazy.

All these macros are really ugly, I wish someone could cleanup this
code.

^ permalink raw reply

* Re: [PATCH] bnx2x: fix occasional statistics off-by-4GB error
From: Maciej Żenczykowski @ 2013-03-18 20:36 UTC (permalink / raw)
  To: David Miller; +Cc: eilong, eric.dumazet, dmitry, netdev, yuvalmin
In-Reply-To: <CAHo-OowoWD_BT2W8ok32pcSmcshXttpErna-X92Ojz_A8es8iw@mail.gmail.com>

Not quite the right call sequence above.
  UPDATE_FSTAT_QSTAT(total_bytes_received); --> SUB_64 --> DIFF_64
is probably more relevant.

Regardless:

/* difference = minuend - subtrahend */
#define DIFF_64(d_hi, m_hi, s_hi, d_lo, m_lo, s_lo) \
<------>do { \
<------><------>if (m_lo < s_lo) { \
<------><------><------>/* underflow */ \
<------><------><------>d_hi = m_hi - s_hi; \
<------><------><------>if (d_hi > 0) { \
<------><------><------><------>/* we can 'loan' 1 */ \
<------><------><------><------>d_hi--; \
<------><------><------><------>d_lo = m_lo + (UINT_MAX - s_lo) + 1; \
<------><------><------>} else { \
<------><------><------><------>/* m_hi <= s_hi */ \
<------><------><------><------>d_hi = 0; \
<------><------><------><------>d_lo = 0; \
<------><------><------>} \
<------><------>} else { \
...
I believe this fails.  All parameters are most likely going to be u32,
since that's used for stats pretty much everywhere.
As such after d_hi = m_hi - s_hi;  d_hi will be >= 0 since it's u32.

As such if "m_hi == s_hi" && "m_lo < s_lo" we will return (0,0)
however if "m_hi < s_hi" && "m_lo < s_lo" we will not return (0,0)

I'm not sure which behaviour is desired, but either way this is obviously wrong.

0 - 1 returns 0
0 - (4GB+1) returns -4GB-1

^ permalink raw reply

* Re: sctp: hang in sctp_remaddr_seq_show
From: Vlad Yasevich @ 2013-03-18 20:39 UTC (permalink / raw)
  To: Neil Horman
  Cc: Eric Dumazet, Sasha Levin, sri, David S. Miller, linux-sctp,
	netdev, Dave Jones, linux-kernel@vger.kernel.org
In-Reply-To: <20130318203202.GB9478@hmsreliant.think-freely.org>

On 03/18/2013 04:32 PM, Neil Horman wrote:
> On Mon, Mar 18, 2013 at 11:31:06AM -0400, Vlad Yasevich wrote:
>> On 03/18/2013 11:25 AM, Eric Dumazet wrote:
>>> On Mon, 2013-03-18 at 07:04 -0400, Neil Horman wrote:
>>>
>>>> I'm not sure why the process would never get back to the schedule, but looking
>>>> at the sctp_remaddr_seq_show function, I think that we should convert this
>>>> sequence:
>>>> 	sctp_local_bh_disable();
>>>> 	read_lock(&head->lock);
>>>> 	rcu_read_lock();
>>>>
>>>> to this:
>>>> 	read_lock(&head->lock);
>>>> 	rcu_read_lock_bh();
>>>>
>>>> Neil
>>>
>>> I dont think so.
>>>
>>> BH needs to be disabled before read_lock(&head->lock);
>>>
>>> or else, write_lock() could deadlock (assuming it can be called from BH)
>>>
>>>
>>
>> If anything, this should probably be done like this:
>>
>> rcu_read_lock();
>> read_lock_bh(&head->lock)
>> ...
>>
>> read_unlock_bh(&head->lock)
>> rcu_read_unlock();
>>
> Vlads, right.  We need to grab the rcu lock before the read lock, but we should
> probably use the rcu_read_lock_bh variant, since we're going to disable bottom
> halves anyway.

I don't think disabling bh as part of rcu gains us anything.  The main 
thing that has to happen is that it needs to be disabled before the hash 
read_lock().  Doing it my way means that we wouldn't have to touch 
call_rcu() sites.  If we change to rcu_read_lock_bh(), we could have to 
convert to call_rcu_bh() and still wouldn't see any gain.

In any case, this is all completely theoretical as the code the way it 
is now should still work and not hang in bh_enable.

Sasha, if you can trigger it easily enough, could you try the above 
alternatives.

Thanks
-vlad

> Neil
>
>> -vlad
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply

* Re: Who/What is supposed to remove IPv6 address from interface when moving from one network to another ?
From: Dan Williams @ 2013-03-18 20:42 UTC (permalink / raw)
  To: Sylvain Munaut; +Cc: Lorenzo Colitti, netdev@vger.kernel.org
In-Reply-To: <CAF6-1L7Von88pAhHzhyaJiL8x4w9cNpJw+k+0Giu5Jh-fYJ37g@mail.gmail.com>

On Mon, 2013-03-18 at 19:28 +0100, Sylvain Munaut wrote:
> Hi,
> 
> >> This might be a dumb idea but, would it make sense for the kernel not
> >> to use as source address, an address generated from a prefix for which
> >> the route has expired ?
> >
> > The route doesn't expire either. (And neither does the default gateway).
> 
> Ah sorry.
> 
> In my case the router lifetime is much shorter ( a few minutes ) and
> expired by the time I get home, which is why I was only faced with the
> "using bad source address" issue.
> 
> In anycase, following Dan's reponse, I just flush all ipv6 on suspend
> now ... (NetworkManager was a bit heavy-weight to just sort that
> particular issue ...)

While I'm not an IPv6 expert I would actually expect the kernel to stop
using any IPv6 address or route that had expired, and that was *added
automatically* by the kernel as a result of a router advertisement.
Perhaps that's not how it actually works, but would be nice to hear from
some kernel IPv6 people why that's not how it works, if that's the case.

Dan

^ permalink raw reply

* Re: sctp: hang in sctp_remaddr_seq_show
From: Eric Dumazet @ 2013-03-18 20:48 UTC (permalink / raw)
  To: Neil Horman
  Cc: Vlad Yasevich, Sasha Levin, sri, David S. Miller, linux-sctp,
	netdev, Dave Jones, linux-kernel@vger.kernel.org
In-Reply-To: <20130318203202.GB9478@hmsreliant.think-freely.org>

On Mon, 2013-03-18 at 16:32 -0400, Neil Horman wrote:

> Vlads, right.  We need to grab the rcu lock before the read lock, but we should
> probably use the rcu_read_lock_bh variant, since we're going to disable bottom
> halves anyway.
> Neil

rcu_read_lock_bh() and {local_bh_disable();rcu_read_lock();} have quite
different semantics.

If you use rcu_read_lock_bh(), you also need to change rcu_dereference()
calls to rcu_dereference_bh() variant.

^ permalink raw reply

* Re: Who/What is supposed to remove IPv6 address from interface when moving from one network to another ?
From: Mikael Abrahamsson @ 2013-03-18 20:49 UTC (permalink / raw)
  To: Dan Williams; +Cc: Sylvain Munaut, Lorenzo Colitti, netdev@vger.kernel.org
In-Reply-To: <1363639366.7698.10.camel@dcbw.foobar.com>

On Mon, 18 Mar 2013, Dan Williams wrote:

> While I'm not an IPv6 expert I would actually expect the kernel to stop 
> using any IPv6 address or route that had expired, and that was *added 
> automatically* by the kernel as a result of a router advertisement. 
> Perhaps that's not how it actually works, but would be nice to hear from 
> some kernel IPv6 people why that's not how it works, if that's the case.

It's my experience that the kernel doesn't detect link-down events and do 
anything useful with that information. When I have asked about this, I get 
server-usage scenarios and people don't want the kernel to do anything.

In older Ubuntu, when moving between wifis with different IPv6 subnets on 
them, you after a while were sitting there with IPv6 addresses from 
several different subnets, of which just one worked. Fail. I believe 
enhancements in the connection manager fixed this, because from 12.04 or 
something, this is not a problem anymore. When eth0 or wlan0 goes down, so 
IPv6 addresses and default routes go away just like the IPv4 equivalents.

Before the IPv4 equivalents went away on link-down but IPv6 was on 
autopilot by kernel seeing RAs and they stayed there long after they 
overspent their welcome.

So, my opinion is that the connection manager needs to take care of this, 
because the kernel doesn't do it (intentionally, by design).

I'm sure the kernel will stop using addresses after they expire, but that 
doesn't happen when a link goes down.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply

* Re: [PATCH] ath9k : Fix ieee80211 work while going to suspend
From: Luis R. Rodriguez @ 2013-03-18 21:03 UTC (permalink / raw)
  To: John W. Linville
  Cc: Parag Warudkar, Luis R. Rodriguez, Jouni Malinen,
	Vasanthakumar Thiagarajan, linux-wireless, ath9k-devel, netdev,
	LKML, senthilb
In-Reply-To: <20130318191340.GA28919@tuxdriver.com>

On Mon, Mar 18, 2013 at 03:13:41PM -0400, John W. Linville wrote:
> Any comments from the ath9k folks?
> 
> On Sat, Mar 16, 2013 at 12:38:14PM -0400, Parag Warudkar wrote:
> > During suspend below warning is seen when ath9k is active.  Attached
> > patch fixes the warning for me. Tested to work across few
> > suspend-resume cycles.
> > 
> > 
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.642939] WARNING: at
> > net/mac80211/util.c:599 ieee80211_can_queue_work.isra.7+0x32/0x40
> > [mac80211]()
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.642940] Hardware name: iMac12,1
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.642941] queueing ieee80211
> > work while going to suspend
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.642972] Modules linked in:
> > joydev hid_logitech_dj zfs(POF) bridge stp llc zunicode(POF) zavl(POF)
> > zcommon(POF) znvpair(POF) spl(OF) zlib_deflate be2iscsi
> > iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio
> > libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
> > iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rfcomm bnep
> > nls_utf8 hfsplus btusb uvcvideo videobuf2_vmalloc videobuf2_memops
> > videobuf2_core videodev bluetooth media coretemp snd_hda_codec_hdmi
> > snd_hda_codec_cirrus snd_hda_intel snd_hda_codec snd_hwdep snd_seq
> > snd_seq_device snd_pcm snd_page_alloc snd_timer snd soundcore arc4
> > ath9k ath9k_common microcode ath9k_hw ath mac80211 iTCO_wdt
> > iTCO_vendor_support cfg80211 lpc_ich mei mfd_core rfkill i2c_i801
> > applesmc apple_bl input_polldev vhost_net tun macvtap macvlan
> > kvm_intel kvm binfmt_misc uinput usb_storage crc32c_intel radeon ttm
> > i915 ghash_clmulni_intel firewire_ohci firewire_core i2c_algo_bit
> > drm_kms_helper crc_itu_t drm tg3 ptp pps_core i2c_core video sunrpc
> > [last unloaded: iptable_mangle]
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.642985] Pid: 0, comm:
> > swapper/0 Tainted: PF          O 3.8.2-206.fc18.x86_64 #1
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.642986] Call Trace:
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.642992]  <IRQ>
> > [<ffffffff8105e61f>] warn_slowpath_common+0x7f/0xc0
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643009]
> > [<ffffffffa0581420>] ? ath_start_rx_poll+0x70/0x70 [ath9k]
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643011]
> > [<ffffffff8105e716>] warn_slowpath_fmt+0x46/0x50
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643018]
> > [<ffffffffa045b542>] ieee80211_can_queue_work.isra.7+0x32/0x40
> > [mac80211]
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643024]
> > [<ffffffffa045b5de>] ieee80211_queue_work+0x2e/0x50 [mac80211]
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643027]
> > [<ffffffffa0581438>] ath_rx_poll+0x18/0x20 [ath9k]
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643029]
> > [<ffffffff8106d0aa>] call_timer_fn+0x3a/0x120
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643032]
> > [<ffffffffa0581420>] ? ath_start_rx_poll+0x70/0x70 [ath9k]
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643034]
> > [<ffffffff814fb240>] ? cpufreq_p4_target+0x120/0x120
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643035]
> > [<ffffffff8106ee3e>] run_timer_softirq+0x1fe/0x2b0
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643037]
> > [<ffffffff814fb240>] ? cpufreq_p4_target+0x120/0x120
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643038]
> > [<ffffffff81066cd0>] __do_softirq+0xd0/0x210
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643040]
> > [<ffffffff8101b913>] ? native_sched_clock+0x13/0x80
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643041]
> > [<ffffffff814fb240>] ? cpufreq_p4_target+0x120/0x120
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643043]
> > [<ffffffff8165985c>] call_softirq+0x1c/0x30
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643045]
> > [<ffffffff810162a5>] do_softirq+0x75/0xb0
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643046]
> > [<ffffffff81066fa5>] irq_exit+0xb5/0xc0
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643048]
> > [<ffffffff8165a1de>] smp_apic_timer_interrupt+0x6e/0x99
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643049]
> > [<ffffffff8165911d>] apic_timer_interrupt+0x6d/0x80
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643052]  <EOI>
> > [<ffffffff810868ce>] ? __hrtimer_start_range_ns+0x1be/0x400
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643053]
> > [<ffffffff814fbc30>] ? cpuidle_wrap_enter+0x50/0xa0
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643054]
> > [<ffffffff814fbc29>] ? cpuidle_wrap_enter+0x49/0xa0
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643056]
> > [<ffffffff814fbc90>] cpuidle_enter_tk+0x10/0x20
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643057]
> > [<ffffffff814fb8c9>] cpuidle_idle_call+0xa9/0x260
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643058]
> > [<ffffffff8101d45f>] cpu_idle+0xaf/0x120
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643061]
> > [<ffffffff81634522>] rest_init+0x72/0x80
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643063]
> > [<ffffffff81d00c40>] start_kernel+0x3d1/0x3de
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643065]
> > [<ffffffff81d0066e>] ? repair_env_string+0x5e/0x5e
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643066]
> > [<ffffffff81d00356>] x86_64_start_reservations+0x131/0x135
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643068]
> > [<ffffffff81d0045a>] x86_64_start_kernel+0x100/0x10f
> > Mar 16 09:39:17 Parags-iMac kernel: [ 3993.643068] ---[ end trace
> > 5595a7f5dd9a2949 ]---
> > 
> > Signed-off-by: Parag Warudkar <parag.lkml@gmail.com>
> > 
> > diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h
> > b/drivers/net/wireless/ath/ath9k/ath9k.h
> > index a56b241..b3088a1 100644
> > --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> > +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> > @@ -764,6 +764,7 @@ struct ath_softc {
> >   atomic_t wow_got_bmiss_intr;
> >   atomic_t wow_sleep_proc_intr; /* in the middle of WoW sleep ? */
> >   u32 wow_intr_before_sleep;
> > + bool suspending;
> >  #endif
> >  };
> > 
> > diff --git a/drivers/net/wireless/ath/ath9k/link.c
> > b/drivers/net/wireless/ath/ath9k/link.c
> > index ade3afb..fa77e19 100644
> > --- a/drivers/net/wireless/ath/ath9k/link.c
> > +++ b/drivers/net/wireless/ath/ath9k/link.c
> > @@ -158,7 +158,8 @@ void ath_start_rx_poll(struct ath_softc *sc, u8 nbeacon)
> >  {
> >   if (!AR_SREV_9300(sc->sc_ah))
> >   return;
> > -
> > + if (sc->suspending)
> > + return;

Thanks for the patch! Please note the style issue here, you should
use a tab, but other than that lets review what happened.

> >   if (!test_bit(SC_OP_PRIM_STA_VIF, &sc->sc_flags))
> >   return;

Note that what this will do is call later mod_timer() for
rx_poll_timer, the right thing to do then, which would
be equivalent to your patch is to modify the ath_start_rx_poll()
to instead use the new API mod_timer_pending() added on v2.6.30
via commit 74019224. This would not re-arm the timer if it was
previously removed.

commit 74019224ac34b044b44a31dd89a54e3477db4896
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Feb 18 12:23:29 2009 +0100

    timers: add mod_timer_pending()
    
    Impact: new timer API
    
    Based on an idea from Martin Josefsson with the help of
    Patrick McHardy and Stephen Hemminger:
    
    introduce the mod_timer_pending() API which is a mod_timer()
    offspring that is an invariant on already removed timers.
    
    (regular mod_timer() re-activates non-pending timers.)
    
    This is useful for the networking code in that it can
    allow unserialized mod_timer_pending() timer-forwarding
    calls, but a single del_timer*() will stop the timer
    from being reactivated again.
    
    Also while at it:
    
    - optimize the regular mod_timer() path some more, the
      timer-stat and a debug check was needlessly duplicated
      in __mod_timer().
    
    - make the exports come straight after the function, as
      most other exports in timer.c already did.
    
    - eliminate __mod_timer() as an external API, change the
      users to mod_timer().
    
    The regular mod_timer() code path is not impacted
    significantly, due to inlining optimizations and due to
    the simplifications.
    
    Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com>
    Acked-by: Stephen Hemminger <shemminger@vyatta.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Patrick McHardy <kaber@trash.net>
    Cc: netdev@vger.kernel.org
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

> > 
> > diff --git a/drivers/net/wireless/ath/ath9k/main.c
> > b/drivers/net/wireless/ath/ath9k/main.c
> > index 6e66f9c..0cf88b7 100644
> > --- a/drivers/net/wireless/ath/ath9k/main.c
> > +++ b/drivers/net/wireless/ath/ath9k/main.c
> > @@ -2150,7 +2150,7 @@ static int ath9k_suspend(struct ieee80211_hw *hw,
> >   int ret = 0;
> > 
> >   mutex_lock(&sc->mutex);
> > -
> > + sc->suspending = 1;
> >   ath_cancel_work(sc);
> >   ath_stop_ani(sc);
> >   del_timer_sync(&sc->rx_poll_timer);

See here we use del_timer_sync().

Can you try this instead for example:

diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index ade3afb..fbd8b4c 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -162,8 +162,8 @@ void ath_start_rx_poll(struct ath_softc *sc, u8 nbeacon)
 	if (!test_bit(SC_OP_PRIM_STA_VIF, &sc->sc_flags))
 		return;
 
-	mod_timer(&sc->rx_poll_timer, jiffies + msecs_to_jiffies
-		  (nbeacon * sc->cur_beacon_conf.beacon_interval));
+	mod_timer_pending(&sc->rx_poll_timer, jiffies + msecs_to_jiffies
+			  (nbeacon * sc->cur_beacon_conf.beacon_interval));
 }
 
 void ath_rx_poll(unsigned long data)

Looking at this makes me think we should review all usage of
mod_timer all over our 802.11 drivers, and mac80211, cfg80211 as
well.

  Luis

^ permalink raw reply related

* Re: [PULL] vhost: tcm_vhost fixes for 3.9
From: Nicholas A. Bellinger @ 2013-03-18 21:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, kvm, virtualization, netdev, linux-kernel, asias,
	target-devel
In-Reply-To: <20130318195443.GB19761@redhat.com>

On Mon, 2013-03-18 at 21:54 +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 18, 2013 at 01:01:36PM -0400, David Miller wrote:
> > From: "Michael S. Tsirkin" <mst@redhat.com>
> > Date: Mon, 18 Mar 2013 13:20:03 +0200
> > 
> > > The following changes since commit 8c6216d7f118a128678270824b6a1286a63863ca:
> > > 
> > >   Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally" (2013-03-16 23:00:41 -0400)
> > > 
> > > are available in the git repository at:
> > > 
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
> > > 
> > > for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:
> > > 
> > >   tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 13:04:14 +0200)
> > 
> > This is a scsi driver, I therefore don't think this pull request is for me.
> > 
> > Please avoid such confusion in the future, and don't use branch names
> > like "vhost-net" for SCSI driver fixes.
> > 
> > Thanks.
> 
> We are sharing quite a bit of code, so it's a common tree.
> There was a vhost-net patch there too but you picked it up
> directly so I dropped that and what's left is vhost-scsi.
> Typically vhost-net dominates so I'm just used to merging all vhost
> things through you but if you think it's wrong I can send it directly to
> Linus.
> 

I'll be sending out a target-pending 3.9-rc-fixes pull request later in
the week, and am happy to include this series if you'd like.

--nab

^ permalink raw reply

* Re: [PULL] vhost: tcm_vhost fixes for 3.9
From: Michael S. Tsirkin @ 2013-03-18 21:14 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: David Miller, kvm, virtualization, netdev, linux-kernel, asias,
	target-devel
In-Reply-To: <1363641003.3156.43.camel@haakon2.linux-iscsi.org>

On Mon, Mar 18, 2013 at 02:10:03PM -0700, Nicholas A. Bellinger wrote:
> On Mon, 2013-03-18 at 21:54 +0200, Michael S. Tsirkin wrote:
> > On Mon, Mar 18, 2013 at 01:01:36PM -0400, David Miller wrote:
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > Date: Mon, 18 Mar 2013 13:20:03 +0200
> > > 
> > > > The following changes since commit 8c6216d7f118a128678270824b6a1286a63863ca:
> > > > 
> > > >   Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally" (2013-03-16 23:00:41 -0400)
> > > > 
> > > > are available in the git repository at:
> > > > 
> > > >   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
> > > > 
> > > > for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:
> > > > 
> > > >   tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 13:04:14 +0200)
> > > 
> > > This is a scsi driver, I therefore don't think this pull request is for me.
> > > 
> > > Please avoid such confusion in the future, and don't use branch names
> > > like "vhost-net" for SCSI driver fixes.
> > > 
> > > Thanks.
> > 
> > We are sharing quite a bit of code, so it's a common tree.
> > There was a vhost-net patch there too but you picked it up
> > directly so I dropped that and what's left is vhost-scsi.
> > Typically vhost-net dominates so I'm just used to merging all vhost
> > things through you but if you think it's wrong I can send it directly to
> > Linus.
> > 
> 
> I'll be sending out a target-pending 3.9-rc-fixes pull request later in
> the week, and am happy to include this series if you'd like.
> 
> --nab

Sure, go ahead. You can add
Acked-by: Michael S. Tsirkin <mst@redhat.com>
to the patches.

-- 
MST

^ permalink raw reply

* Re: [PULL] vhost: tcm_vhost fixes for 3.9
From: Nicholas A. Bellinger @ 2013-03-18 21:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, kvm, virtualization, netdev, linux-kernel, asias,
	target-devel
In-Reply-To: <20130318211418.GA20406@redhat.com>

On Mon, 2013-03-18 at 23:14 +0200, Michael S. Tsirkin wrote:
> On Mon, Mar 18, 2013 at 02:10:03PM -0700, Nicholas A. Bellinger wrote:
> > On Mon, 2013-03-18 at 21:54 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Mar 18, 2013 at 01:01:36PM -0400, David Miller wrote:
> > > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > > Date: Mon, 18 Mar 2013 13:20:03 +0200
> > > > 
> > > > > The following changes since commit 8c6216d7f118a128678270824b6a1286a63863ca:
> > > > > 
> > > > >   Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally" (2013-03-16 23:00:41 -0400)
> > > > > 
> > > > > are available in the git repository at:
> > > > > 
> > > > >   git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net
> > > > > 
> > > > > for you to fetch changes up to deb7cb067dabd3be625eaf5495da8bdc97377fc1:
> > > > > 
> > > > >   tcm_vhost: Flush vhost_work in vhost_scsi_flush() (2013-03-17 13:04:14 +0200)
> > > > 
> > > > This is a scsi driver, I therefore don't think this pull request is for me.
> > > > 
> > > > Please avoid such confusion in the future, and don't use branch names
> > > > like "vhost-net" for SCSI driver fixes.
> > > > 
> > > > Thanks.
> > > 
> > > We are sharing quite a bit of code, so it's a common tree.
> > > There was a vhost-net patch there too but you picked it up
> > > directly so I dropped that and what's left is vhost-scsi.
> > > Typically vhost-net dominates so I'm just used to merging all vhost
> > > things through you but if you think it's wrong I can send it directly to
> > > Linus.
> > > 
> > 
> > I'll be sending out a target-pending 3.9-rc-fixes pull request later in
> > the week, and am happy to include this series if you'd like.
> > 
> > --nab
> 
> Sure, go ahead. You can add
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> to the patches.
> 

Applied to target-pending/master.

Thanks MST & Asias!

^ permalink raw reply

* [PATCH 0/4] Misc MRF24J40 Fixes
From: Alan Ott @ 2013-03-18 22:06 UTC (permalink / raw)
  To: linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alan Ott

These are fairly straight-forward.

Alan Ott (4):
  mrf24j40: pinctrl support
  mrf24j40: Warn if transmit interrupts timeout
  mrf24j40: Increase max SPI speed to 10MHz
  mrf24j40: Fix byte-order of IEEE address

 drivers/net/ieee802154/mrf24j40.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

-- 
1.7.11.2


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar

^ permalink raw reply

* [PATCH 2/4] mrf24j40: Warn if transmit interrupts timeout
From: Alan Ott @ 2013-03-18 22:06 UTC (permalink / raw)
  To: linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alan Ott
In-Reply-To: <1363644403-11003-1-git-send-email-alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

Issue a warning if a transmit complete interrupt doesn't happen in time.

Signed-off-by: Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>
---
 drivers/net/ieee802154/mrf24j40.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c
index 3106895..582c0a3 100644
--- a/drivers/net/ieee802154/mrf24j40.c
+++ b/drivers/net/ieee802154/mrf24j40.c
@@ -362,6 +362,7 @@ static int mrf24j40_tx(struct ieee802154_dev *dev, struct sk_buff *skb)
 	if (ret == -ERESTARTSYS)
 		goto err;
 	if (ret == 0) {
+		dev_warn(printdev(devrec), "Timeout waiting for TX interrupt\n");
 		ret = -ETIMEDOUT;
 		goto err;
 	}
-- 
1.7.11.2


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar

^ permalink raw reply related

* [PATCH 3/4] mrf24j40: Increase max SPI speed to 10MHz
From: Alan Ott @ 2013-03-18 22:06 UTC (permalink / raw)
  To: linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alan Ott
In-Reply-To: <1363644403-11003-1-git-send-email-alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

Upon consulting the datasheet further, it does indicates a maximum speed
for SCK at 10MHz.

Signed-off-by: Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>
---
 drivers/net/ieee802154/mrf24j40.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c
index 582c0a3..b4f9b67 100644
--- a/drivers/net/ieee802154/mrf24j40.c
+++ b/drivers/net/ieee802154/mrf24j40.c
@@ -92,9 +92,8 @@ struct mrf24j40 {
 #define MRF24J40_READLONG(reg) (1 << 15 | (reg) << 5)
 #define MRF24J40_WRITELONG(reg) (1 << 15 | (reg) << 5 | 1 << 4)
 
-/* Maximum speed to run the device at. TODO: Get the real max value from
- * someone at Microchip since it isn't in the datasheet. */
-#define MAX_SPI_SPEED_HZ 1000000
+/* The datasheet indicates the theoretical maximum for SCK to be 10MHz */
+#define MAX_SPI_SPEED_HZ 10000000
 
 #define printdev(X) (&X->spi->dev)
 
-- 
1.7.11.2


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar

^ permalink raw reply related

* [PATCH 4/4] mrf24j40: Fix byte-order of IEEE address
From: Alan Ott @ 2013-03-18 22:06 UTC (permalink / raw)
  To: linux-zigbee-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Alan Ott
In-Reply-To: <1363644403-11003-1-git-send-email-alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>

Load the 64-bit Extended (IEEE) address into the hardware in the proper
byte order.

Signed-off-by: Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>
---
 drivers/net/ieee802154/mrf24j40.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c
index b4f9b67..0ca8f88 100644
--- a/drivers/net/ieee802154/mrf24j40.c
+++ b/drivers/net/ieee802154/mrf24j40.c
@@ -478,7 +478,7 @@ static int mrf24j40_filter(struct ieee802154_dev *dev,
 		int i;
 		for (i = 0; i < 8; i++)
 			write_short_reg(devrec, REG_EADR0+i,
-					filt->ieee_addr[i]);
+					filt->ieee_addr[7-i]);
 
 #ifdef DEBUG
 		printk(KERN_DEBUG "Set long addr to: ");
-- 
1.7.11.2


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar

^ permalink raw reply related

* [PATCH 1/4] mrf24j40: pinctrl support
From: Alan Ott @ 2013-03-18 22:06 UTC (permalink / raw)
  To: linux-zigbee-devel, netdev
  Cc: Alexander Smirnov, Dmitry Eremin-Solenikov, linux-kernel,
	Alan Ott
In-Reply-To: <1363644403-11003-1-git-send-email-alan@signal11.us>

Activate pinctrl settings when used with a DT system.

Signed-off-by: Alan Ott <alan@signal11.us>
---
 drivers/net/ieee802154/mrf24j40.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ieee802154/mrf24j40.c b/drivers/net/ieee802154/mrf24j40.c
index 3f2c7aa..3106895 100644
--- a/drivers/net/ieee802154/mrf24j40.c
+++ b/drivers/net/ieee802154/mrf24j40.c
@@ -22,6 +22,7 @@
 #include <linux/spi/spi.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
+#include <linux/pinctrl/consumer.h>
 #include <net/wpan-phy.h>
 #include <net/mac802154.h>
 
@@ -623,6 +624,7 @@ static int mrf24j40_probe(struct spi_device *spi)
 	int ret = -ENOMEM;
 	u8 val;
 	struct mrf24j40 *devrec;
+	struct pinctrl *pinctrl;
 
 	printk(KERN_INFO "mrf24j40: probe(). IRQ: %d\n", spi->irq);
 
@@ -633,6 +635,11 @@ static int mrf24j40_probe(struct spi_device *spi)
 	if (!devrec->buf)
 		goto err_buf;
 
+	pinctrl = devm_pinctrl_get_select_default(&spi->dev);
+	if (IS_ERR(pinctrl))
+		dev_warn(&spi->dev,
+			"pinctrl pins are not configured from the driver");
+
 	spi->mode = SPI_MODE_0; /* TODO: Is this appropriate for right here? */
 	if (spi->max_speed_hz > MAX_SPI_SPEED_HZ)
 		spi->max_speed_hz = MAX_SPI_SPEED_HZ;
-- 
1.7.11.2

^ permalink raw reply related

* [PATCH 0/2] Address issues in dma-debug API
From: Alexander Duyck @ 2013-03-18 22:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: konrad.wilk, joerg.roedel, konrad, christoph.paasch, mingo,
	shuahkhan, hpa, akpm, shuah.khan, netdev, jeffrey.t.kirsher
In-Reply-To: <3729150.HPUjKjXiGc@cpaasch-mac>

Christoph Paasch recently reported a "device driver failed to check map 
error" on igb.  However after reviewing the code there was no possibility of
that.  On closer inspection there was a bug in the DMA debug API that was
causing the issue.  These two patches address the issues I found.

The first issue I found while trying to implement a workaround.  Specifically
the problem is a locking bug which is triggered if a multiple mapped buffer
exists and there is not an exact match for the unmap.  This results in the CPU
becoming deadlocked.

The second issue, which was the original problem, is resolved by guaranteeing
that if we call dma_mapping_error we set a matching entry to MAP_ERR_CHECKED
that was not previously set. 

I'm not sure if these are critical enough to go into one of the upcoming RC
kernels or if these can wait until the merge since this is in a debugging API.
I'm leaving that for the sub-maintainers to decide.

---

Alexander Duyck (2):
      dma-debug: Fix locking bug in check_unmap
      dma-debug: Update DMA debug API to better handle multiple mappings of a buffer

 lib/dma-debug.c |   42 ++++++++++++++++++++++++++++--------------
 1 files changed, 28 insertions(+), 14 deletions(-)

-- 

^ permalink raw reply

* [PATCH 1/2] dma-debug: Fix locking bug in check_unmap
From: Alexander Duyck @ 2013-03-18 22:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: konrad.wilk, joerg.roedel, konrad, christoph.paasch, mingo,
	shuahkhan, hpa, akpm, shuah.khan, netdev, jeffrey.t.kirsher
In-Reply-To: <20130318220241.7349.5030.stgit@ahduyck-cp1.jf.intel.com>

In check_unmap it is possible to get into a dead-locked state if
dma_mapping_error is called.  The problem is that the bucket is locked in
check_unmap, and locked again by debug_dma_mapping_error which is called by
dma_mapping_error.  To resolve that we must release the lock on the bucket
before making the call to dma_mapping_error.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 lib/dma-debug.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 5e396ac..724bd4d 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -862,17 +862,18 @@ static void check_unmap(struct dma_debug_entry *ref)
 	entry = bucket_find_exact(bucket, ref);
 
 	if (!entry) {
+		/* must drop lock before calling dma_mapping_error */
+		put_hash_bucket(bucket, &flags);
+
 		if (dma_mapping_error(ref->dev, ref->dev_addr)) {
 			err_printk(ref->dev, NULL,
-				   "DMA-API: device driver tries "
-				   "to free an invalid DMA memory address\n");
-			return;
+				   "DMA-API: device driver tries to free an invalid DMA memory address\n");
+		} else {
+			err_printk(ref->dev, NULL,
+				   "DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x%016llx] [size=%llu bytes]\n",
+				   ref->dev_addr, ref->size);
 		}
-		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
-			   "to free DMA memory it has not allocated "
-			   "[device address=0x%016llx] [size=%llu bytes]\n",
-			   ref->dev_addr, ref->size);
-		goto out;
+		return;
 	}
 
 	if (ref->size != entry->size) {
@@ -936,7 +937,6 @@ static void check_unmap(struct dma_debug_entry *ref)
 	hash_bucket_del(entry);
 	dma_entry_free(entry);
 
-out:
 	put_hash_bucket(bucket, &flags);
 }
 

^ permalink raw reply related

* [PATCH 2/2] dma-debug: Update DMA debug API to better handle multiple mappings of a buffer
From: Alexander Duyck @ 2013-03-18 22:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: konrad.wilk, joerg.roedel, konrad, christoph.paasch, mingo,
	shuahkhan, hpa, akpm, shuah.khan, netdev, jeffrey.t.kirsher
In-Reply-To: <20130318220241.7349.5030.stgit@ahduyck-cp1.jf.intel.com>

There were reports of the igb driver unmapping buffers without calling
dma_mapping_error.  On closer inspection issues were found in the DMA debug
API and how it handled multiple mappings of the same buffer.

The issue I found is the fact that the debug_dma_mapping_error would only set
the map_err_type to MAP_ERR_CHECKED in the case that the was only one match
for device and device address.  However in the case of non-IOMMU, multiple
addresses existed and as a result it was not setting this field once a
second mapping was instantiated.  I have resolved this by changing the search
so that it instead will now set MAP_ERR_CHECKED on the first buffer that
matches the device and DMA address that is currently in the state
MAP_ERR_NOT_CHECKED.

A secondary side effect of this patch is that in the case of multiple buffers
using the same address only the last mapping will have a valid map_err_type.
The previous mappings will all end up with map_err_type set to
MAP_ERR_CHECKED because of the dma_mapping_error call in debug_dma_map_page.
However this behavior may be preferable as it means you will likely only see
one real error per multi-mapped buffer, versus the current behavior of
multiple false errors mer multi-mapped buffer.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---
 lib/dma-debug.c |   24 +++++++++++++++++++-----
 1 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index 724bd4d..aa465d9 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -1082,13 +1082,27 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 	ref.dev = dev;
 	ref.dev_addr = dma_addr;
 	bucket = get_hash_bucket(&ref, &flags);
-	entry = bucket_find_exact(bucket, &ref);

-	if (!entry)
-		goto out;
+	list_for_each_entry(entry, &bucket->list, list) {
+		if (!exact_match(&ref, entry))
+			continue;
+
+		/*
+		 * The same physical address can be mapped multiple
+		 * times. Without a hardware IOMMU this results in the
+		 * same device addresses being put into the dma-debug
+		 * hash multiple times too. This can result in false
+		 * positives being reported. Therefore we implement a
+		 * best-fit algorithm here which updates the first entry
+		 * from the hash which fits the reference value and is
+		 * not currently listed as being checked.
+		 */
+		if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
+			entry->map_err_type = MAP_ERR_CHECKED;
+			break;
+		}
+	}

-	entry->map_err_type = MAP_ERR_CHECKED;
-out:
 	put_hash_bucket(bucket, &flags);
 }
 EXPORT_SYMBOL(debug_dma_mapping_error);

^ permalink raw reply related

* Re: [PATCH net-next v2] packet: packet fanout rollover during socket overload
From: Eric Dumazet @ 2013-03-18 23:10 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, davem
In-Reply-To: <1363648051-5976-1-git-send-email-willemb@google.com>

On Mon, 2013-03-18 at 19:07 -0400, Willem de Bruijn wrote:
> Minimize packet drop in a fanout group. If one socket is full,
> roll over packets to another from the group. Maintain flow
> affinity during normal load using an rxhash fanout policy, while
> dispersing unexpected traffic storms that hit a single cpu, such
> as spoofed-source DoS flows. Rollover breaks affinity for flows
> arriving at saturated sockets during those conditions.
> 
> The patch adds a fanout policy ROLLOVER that rotates between sockets,
> filling each socket before moving to the next. It also adds a fanout
> flag ROLLOVER. If passed along with any other fanout policy, the
> primary policy is applied until the chosen socket is full. Then,
> rollover selects another socket, to delay packet drop until the
> entire system is saturated.
> 
> Probing sockets is not free. Selecting the last used socket, as
> rollover does, is a greedy approach that maximizes chance of
> success, at the cost of extreme load imbalance. In practice, with
> sufficiently long queues to absorb bursts, sockets are drained in
> parallel and load balance looks uniform in `top`.
> 
> To avoid contention, scales counters with number of sockets and
> accesses them lockfree. Values are bounds checked to ensure
> correctness.
> 
> Tested using an application with 9 threads pinned to CPUs, one socket
> per thread and sufficient busywork per packet operation to limits each
> thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
> packets, a FANOUT_CPU setup processes 32 Kpps in total without this
> patch, 270 Kpps with the patch. Tested with read() and with a packet
> ring (V1).
> 
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH net-next v2] packet: packet fanout rollover during socket overload
From: Willem de Bruijn @ 2013-03-18 23:07 UTC (permalink / raw)
  To: eric.dumazet, netdev, davem; +Cc: Willem de Bruijn

Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.

The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.

Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.

To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.

Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 include/linux/if_packet.h |   2 +
 net/packet/af_packet.c    | 112 ++++++++++++++++++++++++++++++++++++----------
 2 files changed, 90 insertions(+), 24 deletions(-)

diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index f379929..20a77cc 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -54,6 +54,8 @@ struct sockaddr_ll {
 #define PACKET_FANOUT_HASH		0
 #define PACKET_FANOUT_LB		1
 #define PACKET_FANOUT_CPU		2
+#define PACKET_FANOUT_ROLLOVER		3
+#define PACKET_FANOUT_FLAG_ROLLOVER	0x1000
 #define PACKET_FANOUT_FLAG_DEFRAG	0x8000
 
 struct tpacket_stats {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 1d3115c..8b0e8e6 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -244,6 +244,8 @@ struct packet_ring_buffer {
 
 struct packet_sock;
 static int tpacket_snd(struct packet_sock *po, struct msghdr *msg);
+static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
+		       struct packet_type *pt, struct net_device *orig_dev);
 
 static void *packet_previous_frame(struct packet_sock *po,
 		struct packet_ring_buffer *rb,
@@ -307,10 +309,11 @@ struct packet_fanout {
 	unsigned int		num_members;
 	u16			id;
 	u8			type;
-	u8			defrag;
+	u8			flags;
 	atomic_t		rr_cur;
 	struct list_head	list;
 	struct sock		*arr[PACKET_FANOUT_MAX];
+	int			next[PACKET_FANOUT_MAX];
 	spinlock_t		lock;
 	atomic_t		sk_ref;
 	struct packet_type	prot_hook ____cacheline_aligned_in_smp;
@@ -1093,11 +1096,11 @@ static void *packet_current_rx_frame(struct packet_sock *po,
 
 static void *prb_lookup_block(struct packet_sock *po,
 				     struct packet_ring_buffer *rb,
-				     unsigned int previous,
+				     unsigned int idx,
 				     int status)
 {
 	struct tpacket_kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
-	struct tpacket_block_desc *pbd = GET_PBLOCK_DESC(pkc, previous);
+	struct tpacket_block_desc *pbd = GET_PBLOCK_DESC(pkc, idx);
 
 	if (status != BLOCK_STATUS(pbd))
 		return NULL;
@@ -1161,6 +1164,29 @@ static void packet_increment_head(struct packet_ring_buffer *buff)
 	buff->head = buff->head != buff->frame_max ? buff->head+1 : 0;
 }
 
+static bool packet_rcv_has_room(struct packet_sock *po, struct sk_buff *skb)
+{
+	struct sock *sk = &po->sk;
+	bool has_room;
+
+	if (po->prot_hook.func != tpacket_rcv)
+		return (atomic_read(&sk->sk_rmem_alloc) + skb->truesize)
+		       <= sk->sk_rcvbuf;
+
+	spin_lock(&sk->sk_receive_queue.lock);
+	if (po->tp_version == TPACKET_V3)
+		has_room = prb_lookup_block(po, &po->rx_ring,
+					po->rx_ring.prb_bdqc.kactive_blk_num,
+					TP_STATUS_KERNEL);
+	else
+		has_room = packet_lookup_frame(po, &po->rx_ring,
+					       po->rx_ring.head,
+					       TP_STATUS_KERNEL);
+	spin_unlock(&sk->sk_receive_queue.lock);
+
+	return has_room;
+}
+
 static void packet_sock_destruct(struct sock *sk)
 {
 	skb_queue_purge(&sk->sk_error_queue);
@@ -1186,16 +1212,16 @@ static int fanout_rr_next(struct packet_fanout *f, unsigned int num)
 	return x;
 }
 
-static struct sock *fanout_demux_hash(struct packet_fanout *f, struct sk_buff *skb, unsigned int num)
+static unsigned int fanout_demux_hash(struct packet_fanout *f,
+				      struct sk_buff *skb,
+				      unsigned int num)
 {
-	u32 idx, hash = skb->rxhash;
-
-	idx = ((u64)hash * num) >> 32;
-
-	return f->arr[idx];
+	return (((u64)skb->rxhash) * num) >> 32;
 }
 
-static struct sock *fanout_demux_lb(struct packet_fanout *f, struct sk_buff *skb, unsigned int num)
+static unsigned int fanout_demux_lb(struct packet_fanout *f,
+				    struct sk_buff *skb,
+				    unsigned int num)
 {
 	int cur, old;
 
@@ -1203,14 +1229,40 @@ static struct sock *fanout_demux_lb(struct packet_fanout *f, struct sk_buff *skb
 	while ((old = atomic_cmpxchg(&f->rr_cur, cur,
 				     fanout_rr_next(f, num))) != cur)
 		cur = old;
-	return f->arr[cur];
+	return cur;
+}
+
+static unsigned int fanout_demux_cpu(struct packet_fanout *f,
+				     struct sk_buff *skb,
+				     unsigned int num)
+{
+	return smp_processor_id() % num;
 }
 
-static struct sock *fanout_demux_cpu(struct packet_fanout *f, struct sk_buff *skb, unsigned int num)
+static unsigned int fanout_demux_rollover(struct packet_fanout *f,
+					  struct sk_buff *skb,
+					  unsigned int idx, unsigned int skip,
+					  unsigned int num)
 {
-	unsigned int cpu = smp_processor_id();
+	unsigned int i, j;
 
-	return f->arr[cpu % num];
+	i = j = min_t(int, f->next[idx], num - 1);
+	do {
+		if (i != skip && packet_rcv_has_room(pkt_sk(f->arr[i]), skb)) {
+			if (i != j)
+				f->next[idx] = i;
+			return i;
+		}
+		if (++i == num)
+			i = 0;
+	} while (i != j);
+
+	return idx;
+}
+
+static bool fanout_has_flag(struct packet_fanout *f, u16 flag)
+{
+	return f->flags & (flag >> 8);
 }
 
 static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
@@ -1219,7 +1271,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 	struct packet_fanout *f = pt->af_packet_priv;
 	unsigned int num = f->num_members;
 	struct packet_sock *po;
-	struct sock *sk;
+	unsigned int idx;
 
 	if (!net_eq(dev_net(dev), read_pnet(&f->net)) ||
 	    !num) {
@@ -1230,23 +1282,31 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 	switch (f->type) {
 	case PACKET_FANOUT_HASH:
 	default:
-		if (f->defrag) {
+		if (fanout_has_flag(f, PACKET_FANOUT_FLAG_DEFRAG)) {
 			skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET);
 			if (!skb)
 				return 0;
 		}
 		skb_get_rxhash(skb);
-		sk = fanout_demux_hash(f, skb, num);
+		idx = fanout_demux_hash(f, skb, num);
 		break;
 	case PACKET_FANOUT_LB:
-		sk = fanout_demux_lb(f, skb, num);
+		idx = fanout_demux_lb(f, skb, num);
 		break;
 	case PACKET_FANOUT_CPU:
-		sk = fanout_demux_cpu(f, skb, num);
+		idx = fanout_demux_cpu(f, skb, num);
+		break;
+	case PACKET_FANOUT_ROLLOVER:
+		idx = fanout_demux_rollover(f, skb, 0, (unsigned int) -1, num);
 		break;
 	}
 
-	po = pkt_sk(sk);
+	po = pkt_sk(f->arr[idx]);
+	if (fanout_has_flag(f, PACKET_FANOUT_FLAG_ROLLOVER) &&
+	    unlikely(!packet_rcv_has_room(po, skb))) {
+		idx = fanout_demux_rollover(f, skb, idx, idx, num);
+		po = pkt_sk(f->arr[idx]);
+	}
 
 	return po->prot_hook.func(skb, dev, &po->prot_hook, orig_dev);
 }
@@ -1286,10 +1346,13 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 	struct packet_sock *po = pkt_sk(sk);
 	struct packet_fanout *f, *match;
 	u8 type = type_flags & 0xff;
-	u8 defrag = (type_flags & PACKET_FANOUT_FLAG_DEFRAG) ? 1 : 0;
+	u8 flags = type_flags >> 8;
 	int err;
 
 	switch (type) {
+	case PACKET_FANOUT_ROLLOVER:
+		if (type_flags & PACKET_FANOUT_FLAG_ROLLOVER)
+			return -EINVAL;
 	case PACKET_FANOUT_HASH:
 	case PACKET_FANOUT_LB:
 	case PACKET_FANOUT_CPU:
@@ -1314,7 +1377,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 		}
 	}
 	err = -EINVAL;
-	if (match && match->defrag != defrag)
+	if (match && match->flags != flags)
 		goto out;
 	if (!match) {
 		err = -ENOMEM;
@@ -1324,7 +1387,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags)
 		write_pnet(&match->net, sock_net(sk));
 		match->id = id;
 		match->type = type;
-		match->defrag = defrag;
+		match->flags = flags;
 		atomic_set(&match->rr_cur, 0);
 		INIT_LIST_HEAD(&match->list);
 		spin_lock_init(&match->lock);
@@ -3311,7 +3374,8 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 			len = sizeof(int);
 		val = (po->fanout ?
 		       ((u32)po->fanout->id |
-			((u32)po->fanout->type << 16)) :
+			((u32)po->fanout->type << 16) |
+			((u32)po->fanout->flags << 24)) :
 		       0);
 		data = &val;
 		break;
-- 
1.8.1.3

^ permalink raw reply related

* Re: [PATCH net-next v2] packet: packet fanout rollover during socket overload
From: Willem de Bruijn @ 2013-03-18 23:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller
In-Reply-To: <1363648249.21184.11.camel@edumazet-glaptop>

On Mon, Mar 18, 2013 at 7:10 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2013-03-18 at 19:07 -0400, Willem de Bruijn wrote:
>> Minimize packet drop in a fanout group. If one socket is full,
>> roll over packets to another from the group. Maintain flow
>> affinity during normal load using an rxhash fanout policy, while
>> dispersing unexpected traffic storms that hit a single cpu, such
>> as spoofed-source DoS flows. Rollover breaks affinity for flows
>> arriving at saturated sockets during those conditions.
>>
>> The patch adds a fanout policy ROLLOVER that rotates between sockets,
>> filling each socket before moving to the next. It also adds a fanout
>> flag ROLLOVER. If passed along with any other fanout policy, the
>> primary policy is applied until the chosen socket is full. Then,
>> rollover selects another socket, to delay packet drop until the
>> entire system is saturated.
>>
>> Probing sockets is not free. Selecting the last used socket, as
>> rollover does, is a greedy approach that maximizes chance of
>> success, at the cost of extreme load imbalance. In practice, with
>> sufficiently long queues to absorb bursts, sockets are drained in
>> parallel and load balance looks uniform in `top`.
>>
>> To avoid contention, scales counters with number of sockets and
>> accesses them lockfree. Values are bounds checked to ensure
>> correctness.
>>
>> Tested using an application with 9 threads pinned to CPUs, one socket
>> per thread and sufficient busywork per packet operation to limits each
>> thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
>> packets, a FANOUT_CPU setup processes 32 Kpps in total without this
>> patch, 270 Kpps with the patch. Tested with read() and with a packet
>> ring (V1).
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks Eric.

I need to rebase the patch once more to have it apply cleanly.

>
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox