Netdev List
 help / color / mirror / Atom feed
* Re: netconf notes and materials
From: Peter P Waskiewicz Jr @ 2009-10-09 23:01 UTC (permalink / raw)
  To: David Miller; +Cc: billfink@mindspring.com, netdev@vger.kernel.org
In-Reply-To: <20091009.145151.130967664.davem@davemloft.net>

On Fri, 2009-10-09 at 14:51 -0700, David Miller wrote:
> From: Bill Fink <billfink@mindspring.com>
> Date: Fri, 9 Oct 2009 17:49:49 -0400
> 
> > How did you do the NUMA memory performance monitoring that was
> > presented on one of your slides?
> 
> Using proprietary internal tools Intel is unlikely to release.
> 
> On the bright side, some of those metrics will make their way into the
> 'perf' facilities in the kernel so they can be monitored, but not all
> of them.

Yes, they are tools written by our CPU and chipset teams to assist in
debug.  To reinforce what David just said, I know Jesse Barnes is
working hard with the Intel powers-that-be to get the "approved" public
PMU counters into the perf utility.  I'd imagine the PMU counters for
IOH memory throughput will be deemed ok, since it's not uncovering any
IP.

If I hear anything about any of the performance counters, I'll be sure
to forward that on.

Cheers,
-PJ


^ permalink raw reply

* Re: behaviour question for igb on nehalem box
From: Alexander Duyck @ 2009-10-09 23:20 UTC (permalink / raw)
  To: Chris Friesen
  Cc: e1000-list ; gospo@redhat.com, Linux Network Development list,
	Allan, Bruce W, Brandeburg, Jesse, Ronciak, John,
	Kirsher, Jeffrey T
In-Reply-To: <4ACFB9DF.9080909@nortel.com>

Chris Friesen wrote:
> On 10/09/2009 02:22 PM, Brandeburg, Jesse wrote:
>> On Fri, 9 Oct 2009, Chris Friesen wrote:
>>> I've got some general questions around the expected behaviour of the
>>> 82576 igb net device.  (On a dual quad-core Nehalem box, if it matters.)
> 
>> the hardware you have only supports 8 
>> queues (rx and tx) and the driver is configured to only set up 4 max.
> 
> The datasheet for the 82576 says 16 tx queues and 16 rx queues.  Is that
> a typo or do we have the economy version?

Actually the limitation is due to the fact that there are only 10 
interrupts available.  On kernels that support TX multi-queue the number 
of queues would be 4 TX and 4 RX, which would consume 8 interrupts 
leaving 1 for the link status change and one unused.

However on the kernel you are using I don't believe multi-queue NAPI is 
enabled so you shouldn't have multiple RX queues either.  On a 2.6.18 
kernel you should have only 1 RX and 1 TX queue unless you are using the 
driver provided on e1000.sourceforge.net which uses fake netdevs to 
support multi-queue NAPI.  I believe this may be a bug that was 
introduced when SR-IOV support was back-ported from the 2.6.30 kernel.

>>> My second question is around how the rx queues are mapped to interrupts.
>>>  According to /proc/interrupts there appears to be a 1:1 mapping between
>>> queues and interrupts.  However, I've set up at test with a given amount
>>> of traffic coming in to the device (from 4 different IP addresses and 4
>>> ports).  Under this scenario, "ethtool -S" shows the number of packets
>>> increasing for only rx queue 0, but I see the interrupt count going up
>>> for two interrupts.
>> one transmit interrupt and one receive interrupt?
> 
> No, two rx interrupts.  (Can't remember if the tx interrupt was going up
> as well or no...was only looking at rx.)

This may be due to the bug I mentioned above.  Multiple RX queues 
shouldn't be present on the 2.6.18 kernel as I do not believe 
multi-queue NAPI has been back-ported and it could have negative effects.

>> RSS will spread the 
>> receive work out in a flow based way, based on ip/xDP header.  Your test 
>> as described should be using more than one flow (and therefore more than 
>> one rx queue) unless you got caught out by the default arp_filter 
>> behavior (check arp -an).
> 
> I was surprised as well since it didn't match what I expected.  What's
> the story around the arp_filter?  I just logged onto the test box and
> "arp -an" gives:
> 
> ? (47.135.251.129) at 00:00:5E:00:01:08 [ether] on eth0
> 
> but I'm not sure that's worth anything since someone is running a test
> and it's currently using all four rx queues and all four rx interrupt
> counts are increasing.  I'll have to see if they changed anything.
> 
> 
>> Hope this helps,
> 
> That's great, thanks.
> 
> Chris
> 
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay 
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply

* Re: [PATCH] Generalize socket rx gap / receive queue overflow cmsg (v2)
From: Neil Horman @ 2009-10-09 23:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, socketcan
In-Reply-To: <4ACFABAE.5050003@gmail.com>

On Fri, Oct 09, 2009 at 11:31:26PM +0200, Eric Dumazet wrote:
> Neil Horman a écrit :
> 
> >  
> > +extern void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
> > +	struct sk_buff *skb);
> 
> Surely you meant __sock_recv_drops() ? It only deals with drops.
> 
No, I certainly meant both.  The defintion clearly handles both the timestamp
cmsg and the drops cmsg.  That way we don't need to make two calls in the
receive path for these

> 
> > +	case SO_RXQ_OVFL:
> > +		v.val = sock_flag(sk, SOCK_RXQ_OVFL);
> > +		break;
> > +
> 
> Hmm, I advise to use v.val = !!sock_flag(sk, SOCK_RXQ_OVFL);
> So that application gets 0 or 1, not 0 or some big value.
> Its better because it allows us to change internal SOCK_RXQ_OVFL if necessary in the future.
> 
I don't really see any difference, sock_flag is simply a wrapper around
test_bit.  I can change it if you really need, but it just looks like additional
operations to me

> >  drop_n_acct:
> > -	spin_lock(&sk->sk_receive_queue.lock);
> > -	po->stats.tp_drops++;
> > -	spin_unlock(&sk->sk_receive_queue.lock);
> > +	po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);
> 
> Yes :)
> 
> >  EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
> >  
> > +void __sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
> > +	struct sk_buff *skb)
> > +{
> > +	put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL, sizeof(__u32), &skb->dropcount);
> > +}
> > +EXPORT_SYMBOL_GPL(__sock_recv_ts_and_drops);
> > +
> 
> Just change the name.
> 
No.  I'm differentiating from sock_recv_timestamp here, as I'm concerned about
the case in which we enqueue to sk_error_queue.  For those cases, in which
recvmsg is called with MSG_ERRQUEUE, I don't think its right to apply a cmsg
based on skb->dropcount, since the frame may have been from a tx path, or may
not have had dropcount set in the first place.  timestamp is still recorded
there, but I don't think we should mark the dropcount.

> And is it really too large to be inlined ?
> 
No, I was just following the style of sock_recv_timestamp.  

> In the contrary, sock_recv_timestamp() is so large that I suspect
> your sock_recv_ts_and_drops should *not* be inlined, and include inlined versions only :
> 
> I suggest something more orthogonal like :
> 
> void inline sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
> {
> 	if (sock_flag(sk, SOCK_RXQ_OVFL) && skb && skb->dropcount)
> 		put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL,
> 			 sizeof(__u32), &skb->dropcount);
> }
> 
> void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
> {
> 	sock_recv_timestamp(msg, sk, skb); // inlined
> 	sock_recv_drops(msg, sk, skb); // inlined
> }
> EXPORT_SYMBOL_GPL(sock_recv_ts_and_drops)
Fine.

^ permalink raw reply

* Re: netconf notes and materials
From: Krzysztof Halasa @ 2009-10-09 23:31 UTC (permalink / raw)
  To: Peter P Waskiewicz Jr
  Cc: David Miller, billfink@mindspring.com, netdev@vger.kernel.org
In-Reply-To: <1255129289.8937.39.camel@localhost.localdomain>

Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> writes:

> Yes, they are tools written by our CPU and chipset teams to assist in
> debug.  To reinforce what David just said, I know Jesse Barnes is
> working hard with the Intel powers-that-be to get the "approved" public
> PMU counters into the perf utility.  I'd imagine the PMU counters for
> IOH memory throughput will be deemed ok, since it's not uncovering any
> IP.

Ehm perhaps something similar could also happen to source of IXP4xx
microcode and/or NPE docs? Not sure about IP, though isn't any IP there
already patented (and thus available)?

This is especially needed WRT HSS (sync serial port).

:-)
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: netconf notes and materials
From: Rick Jones @ 2009-10-09 23:35 UTC (permalink / raw)
  To: Peter P Waskiewicz Jr
  Cc: David Miller, billfink@mindspring.com, netdev@vger.kernel.org
In-Reply-To: <1255129289.8937.39.camel@localhost.localdomain>

Peter P Waskiewicz Jr wrote:
> On Fri, 2009-10-09 at 14:51 -0700, David Miller wrote:
> 
>>From: Bill Fink <billfink@mindspring.com>
>>Date: Fri, 9 Oct 2009 17:49:49 -0400
>>
>>
>>>How did you do the NUMA memory performance monitoring that was
>>>presented on one of your slides?
>>
>>Using proprietary internal tools Intel is unlikely to release.
>>
>>On the bright side, some of those metrics will make their way into the
>>'perf' facilities in the kernel so they can be monitored, but not all
>>of them.
> 
> 
> Yes, they are tools written by our CPU and chipset teams to assist in
> debug.  To reinforce what David just said, I know Jesse Barnes is
> working hard with the Intel powers-that-be to get the "approved" public
> PMU counters into the perf utility.  I'd imagine the PMU counters for
> IOH memory throughput will be deemed ok, since it's not uncovering any
> IP.
> 
> If I hear anything about any of the performance counters, I'll be sure
> to forward that on.

 From the standpoint of I/O, anything that might enable a port of:

http://pcitop.berlios.de/

would be goodness.

rick jones

^ permalink raw reply

* Re: [PATCH 2/8] bitmap: Introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
From: Andrew Morton @ 2009-10-09 23:41 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: Fenghua Yu, Greg Kroah-Hartman, linux-ia64, Tony Luck, x86,
	netdev, linux-kernel, linux-altix, Yevgeny Petrilin,
	FUJITA Tomonori, linuxppc-dev, Ingo Molnar, Paul Mackerras,
	H. Peter Anvin, sparclinux, Thomas Gleixner, linux-usb,
	David S. Miller, Lothar Wassmann
In-Reply-To: <1255076961-21325-2-git-send-email-akinobu.mita@gmail.com>

On Fri,  9 Oct 2009 17:29:15 +0900
Akinobu Mita <akinobu.mita@gmail.com> wrote:

> This introduces new bitmap functions:
> 
> bitmap_set: Set specified bit area
> bitmap_clear: Clear specified bit area
> bitmap_find_next_zero_area: Find free bit area
> 
> These are stolen from iommu helper.
> 
> I changed the return value of bitmap_find_next_zero_area if there is
> no zero area.
> 
> find_next_zero_area in iommu helper: returns -1
> bitmap_find_next_zero_area: return >= bitmap size

I'll plan to merge this patch into 2.6.32 so we can trickle all the
other patches into subsystems in an orderly fashion.

> +void bitmap_set(unsigned long *map, int i, int len)
> +{
> +	int end = i + len;
> +
> +	while (i < end) {
> +		__set_bit(i, map);
> +		i++;
> +	}
> +}

This is really inefficient, isn't it?  It's a pretty trivial matter to
romp through memory 32 or 64 bits at a time.

> +EXPORT_SYMBOL(bitmap_set);
> +
> +void bitmap_clear(unsigned long *map, int start, int nr)
> +{
> +	int end = start + nr;
> +
> +	while (start < end) {
> +		__clear_bit(start, map);
> +		start++;
> +	}
> +}
> +EXPORT_SYMBOL(bitmap_clear);

Ditto.

> +unsigned long bitmap_find_next_zero_area(unsigned long *map,
> +					 unsigned long size,
> +					 unsigned long start,
> +					 unsigned int nr,
> +					 unsigned long align_mask)
> +{
> +	unsigned long index, end, i;
> +again:
> +	index = find_next_zero_bit(map, size, start);
> +
> +	/* Align allocation */
> +	index = (index + align_mask) & ~align_mask;
> +
> +	end = index + nr;
> +	if (end >= size)
> +		return end;
> +	i = find_next_bit(map, end, index);
> +	if (i < end) {
> +		start = i + 1;
> +		goto again;
> +	}
> +	return index;
> +}
> +EXPORT_SYMBOL(bitmap_find_next_zero_area);

This needs documentation, please.  It appears that `size' is the size
of the bitmap and `nr' is the number of zeroed bits we're looking for,
but an inattentive programmer could get those reversed.

Also the semantics of `align_mask' could benefit from spelling out.  Is
the alignment with respect to memory boundaries or with respect to
`map' or with respect to map+start or what?

And why does align_mask exist at all?  I was a bit surprised to see it
there.  In which scenarios will it be non-zero?

^ permalink raw reply

* Re: behaviour question for igb on nehalem box
From: Alexander Duyck @ 2009-10-09 23:48 UTC (permalink / raw)
  To: Chris Friesen
  Cc: e1000-list, Linux Network Development list, Allan, Bruce W,
	Brandeburg, Jesse, Ronciak, John, Kirsher, Jeffrey T,
	gospo@redhat.com
In-Reply-To: <4ACFC52F.4050509@intel.com>

Alexander Duyck wrote:
> Chris Friesen wrote:
>> On 10/09/2009 02:22 PM, Brandeburg, Jesse wrote:
>>> On Fri, 9 Oct 2009, Chris Friesen wrote:
>>>> I've got some general questions around the expected behaviour of the
>>>> 82576 igb net device.  (On a dual quad-core Nehalem box, if it matters.)
>>> the hardware you have only supports 8 
>>> queues (rx and tx) and the driver is configured to only set up 4 max.
>> The datasheet for the 82576 says 16 tx queues and 16 rx queues.  Is that
>> a typo or do we have the economy version?
> 
> Actually the limitation is due to the fact that there are only 10 
> interrupts available.  On kernels that support TX multi-queue the number 
> of queues would be 4 TX and 4 RX, which would consume 8 interrupts 
> leaving 1 for the link status change and one unused.
> 
> However on the kernel you are using I don't believe multi-queue NAPI is 
> enabled so you shouldn't have multiple RX queues either.  On a 2.6.18 
> kernel you should have only 1 RX and 1 TX queue unless you are using the 
> driver provided on e1000.sourceforge.net which uses fake netdevs to 
> support multi-queue NAPI.  I believe this may be a bug that was 
> introduced when SR-IOV support was back-ported from the 2.6.30 kernel.

Actually after looking closer at the Redhat source it looks like they 
have done the fake netdev workaround in their own code so I guess igb 
driver in the RHEL kernel does support multiple RX queues.

>>>> My second question is around how the rx queues are mapped to interrupts.
>>>>  According to /proc/interrupts there appears to be a 1:1 mapping between
>>>> queues and interrupts.  However, I've set up at test with a given amount
>>>> of traffic coming in to the device (from 4 different IP addresses and 4
>>>> ports).  Under this scenario, "ethtool -S" shows the number of packets
>>>> increasing for only rx queue 0, but I see the interrupt count going up
>>>> for two interrupts.
>>> one transmit interrupt and one receive interrupt?
>> No, two rx interrupts.  (Can't remember if the tx interrupt was going up
>> as well or no...was only looking at rx.)
> 
> This may be due to the bug I mentioned above.  Multiple RX queues 
> shouldn't be present on the 2.6.18 kernel as I do not believe 
> multi-queue NAPI has been back-ported and it could have negative effects.

The odds of any 2 flows overlapping when you are only using 4 flows is 
pretty high, especially if the addresses/ports are close in range.  You 
typically need something on the order of about 16 flows over a wide 
range of port numbers in order to get a good distribution.

Thanks,

Alex




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference

^ permalink raw reply

* RE: [PATCH 2.6.32-rc3] net: VMware virtual Ethernet NIC driver: vmxnet3
From: Shreyas Bhatewara @ 2009-10-09 23:52 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jeff, pv-drivers, netdev, linux-kernel, Andrew, Wright,
	Anthony Liguori, Greg Kroah-Hartman, Chris, Morton,
	virtualization, Garzik, David S. Miller
In-Reply-To: <20091009143538.644844aa@nehalam>

> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger@linux-foundation.org]
> Sent: Friday, October 09, 2009 2:36 PM
> To: Shreyas Bhatewara
> Cc: linux-kernel; netdev; David S. Miller; Jeff Garzik; Anthony
> Liguori; Chris Wright; Greg Kroah-Hartman; Andrew Morton;
> virtualization; pv-drivers
> Subject: Re: [PATCH 2.6.32-rc3] net: VMware virtual Ethernet NIC
> driver: vmxnet3
> 
> On Thu, 8 Oct 2009 10:59:26 -0700 (PDT)
> Shreyas Bhatewara <sbhatewara@vmware.com> wrote:
> 
> > Hello all,
> >
> > I do not mean to be bothersome but this thread has been unusually
> silent.
> > Could you please review the patch for me and reply with your comments
> /
> > acks ?
> >
> > Thanks.
> > ->Shreyas
> 
> 
> Looks fine, but just a minor style nit (can be changed after insertion
> in mainline).
> 
> The code:
> 
> static void
> vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget, int
> *txd_done,
> 		int *rxd_done)
> {
> 	if (unlikely(adapter->shared->ecr))
> 		vmxnet3_process_events(adapter);
> 
> 	*txd_done = vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
> 	*rxd_done = vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter,
> budget);
> }
> 
> 
> static int
> vmxnet3_poll(struct napi_struct *napi, int budget)
> {
> 	struct vmxnet3_adapter *adapter = container_of(napi,
> 					  struct vmxnet3_adapter, napi);
> 	int rxd_done, txd_done;
> 
> 	vmxnet3_do_poll(adapter, budget, &txd_done, &rxd_done);
> 
> 	if (rxd_done < budget) {
> 		napi_complete(napi);
> 		vmxnet3_enable_intr(adapter, 0);
> 	}
> 	return rxd_done;
> }
> 
> 
> Is simpler if you just have do_poll return rx done value. Probably Gcc
> inline's it all anyway.
> 
> static int
> vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget)
> {
> 	if (unlikely(adapter->shared->ecr))
> 		vmxnet3_process_events(adapter);
> 
> 	vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
> 	return vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter,
> budget);
> }
> 
> 
> static int
> vmxnet3_poll(struct napi_struct *napi, int budget)
> {
> 	struct vmxnet3_adapter *adapter = container_of(napi,
> 					  struct vmxnet3_adapter, napi);
> 	int rxd_done;
> 
> 	rxd_done = vmxnet3_do_poll(adapter, budget);
> 	if (rxd_done < budget) {
> 		napi_complete(napi);
> 		vmxnet3_enable_intr(adapter, 0);
> 	}
> 	return rxd_done;
> }



Thanks Stephen.

Yes, the vmxnet3_do_poll() was an inline function in the very first patch. It was thought of as a better idea to let gcc handle the inlining.
I will piggyback this nit on a forthcoming change.

->Shreyas

^ permalink raw reply

* Re: [PATCH] Generalize socket rx gap / receive queue overflow cmsg (v3)
From: Neil Horman @ 2009-10-09 23:56 UTC (permalink / raw)
  To: netdev; +Cc: eric.dumazet, davem, socketcan, nhorman
In-Reply-To: <20091007180835.GB20524@hmsreliant.think-freely.org>

Ok, take 3 with Erics new notes

Change Notes:

1) Modified inlining of sock_recv_ts_and_drops to be more efficient

2) modify getsockopt for SO_RXQ_OVFL to gurantee only a 1 or 0 return

=============================================================


Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames.  This value was
exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option.  As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames.  It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count).  Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me.  This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 include/asm-generic/socket.h |    1 +
 include/linux/skbuff.h       |    6 ++++--
 include/net/sock.h           |    3 +++
 net/atm/common.c             |    2 +-
 net/bluetooth/af_bluetooth.c |    2 +-
 net/bluetooth/rfcomm/sock.c  |    2 +-
 net/can/bcm.c                |    2 +-
 net/can/raw.c                |    2 +-
 net/core/sock.c              |   17 ++++++++++++++++-
 net/ieee802154/dgram.c       |    2 +-
 net/ieee802154/raw.c         |    2 +-
 net/ipv4/raw.c               |    2 +-
 net/ipv4/udp.c               |    2 +-
 net/ipv6/raw.c               |    2 +-
 net/ipv6/udp.c               |    2 +-
 net/key/af_key.c             |    2 +-
 net/packet/af_packet.c       |    7 +++----
 net/rxrpc/ar-recvmsg.c       |    2 +-
 net/sctp/socket.c            |    2 +-
 net/socket.c                 |   16 ++++++++++++++++
 20 files changed, 57 insertions(+), 21 deletions(-)

diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 538991c..9a6115e 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -63,4 +63,5 @@
 #define SO_PROTOCOL		38
 #define SO_DOMAIN		39
 
+#define SO_RXQ_OVFL             40
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -389,8 +389,10 @@ struct sk_buff {
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32			secmark;
 #endif
-
-	__u32			mark;
+	union {
+		__u32		mark;
+		__u32		dropcount;
+	};
 
 	__u16			vlan_tci;
 
diff --git a/include/net/sock.h b/include/net/sock.h
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -505,6 +505,7 @@ enum sock_flags {
 	SOCK_TIMESTAMPING_RAW_HARDWARE, /* %SOF_TIMESTAMPING_RAW_HARDWARE */
 	SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
 	SOCK_FASYNC, /* fasync() active */
+	SOCK_RXQ_OVFL,
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -1493,6 +1494,8 @@ sock_recv_timestamp(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
 		sk->sk_stamp = kt;
 }
 
+extern void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb);
+
 /**
  * sock_tx_timestamp - checks whether the outgoing packet is to be time stamped
  * @msg:	outgoing packet
diff --git a/net/atm/common.c b/net/atm/common.c
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -496,7 +496,7 @@ int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 	error = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
 	if (error)
 		return error;
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 	pr_debug("RcvM %d -= %d\n", atomic_read(&sk->sk_rmem_alloc), skb->truesize);
 	atm_return(vcc, skb->truesize);
 	skb_free_datagram(sk, skb);
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -257,7 +257,7 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 	skb_reset_transport_header(skb);
 	err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
 	if (err == 0)
-		sock_recv_timestamp(msg, sk, skb);
+		sock_recv_ts_and_drops(msg, sk, skb);
 
 	skb_free_datagram(sk, skb);
 
diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -703,7 +703,7 @@ static int rfcomm_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 		copied += chunk;
 		size   -= chunk;
 
-		sock_recv_timestamp(msg, sk, skb);
+		sock_recv_ts_and_drops(msg, sk, skb);
 
 		if (!(flags & MSG_PEEK)) {
 			atomic_sub(chunk, &sk->sk_rmem_alloc);
diff --git a/net/can/bcm.c b/net/can/bcm.c
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1534,7 +1534,7 @@ static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return err;
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (msg->msg_name) {
 		msg->msg_namelen = sizeof(struct sockaddr_can);
diff --git a/net/can/raw.c b/net/can/raw.c
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -702,7 +702,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 		return err;
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (msg->msg_name) {
 		msg->msg_namelen = sizeof(struct sockaddr_can);
diff --git a/net/core/sock.c b/net/core/sock.c
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -276,6 +276,8 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int err = 0;
 	int skb_len;
+	unsigned long flags;
+	struct sk_buff_head *list = &sk->sk_receive_queue;
 
 	/* Cast sk->rcvbuf to unsigned... It's pointless, but reduces
 	   number of warnings when compiling with -W --ANK
@@ -305,7 +307,10 @@ int sock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	 */
 	skb_len = skb->len;
 
-	skb_queue_tail(&sk->sk_receive_queue, skb);
+	spin_lock_irqsave(&list->lock, flags);
+	skb->dropcount = atomic_read(&sk->sk_drops);
+	__skb_queue_tail(list, skb);
+	spin_unlock_irqrestore(&list->lock, flags);
 
 	if (!sock_flag(sk, SOCK_DEAD))
 		sk->sk_data_ready(sk, skb_len);
@@ -702,6 +707,12 @@ set_rcvbuf:
 
 		/* We implement the SO_SNDLOWAT etc to
 		   not be settable (1003.1g 5.3) */
+	case SO_RXQ_OVFL:
+		if (valbool)
+			sock_set_flag(sk, SOCK_RXQ_OVFL);
+		else
+			sock_reset_flag(sk, SOCK_RXQ_OVFL);
+		break;
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -901,6 +912,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_mark;
 		break;
 
+	case SO_RXQ_OVFL:
+		v.val = !!sock_flag(sk, SOCK_RXQ_OVFL);
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ieee802154/dgram.c b/net/ieee802154/dgram.c
--- a/net/ieee802154/dgram.c
+++ b/net/ieee802154/dgram.c
@@ -303,7 +303,7 @@ static int dgram_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (err)
 		goto done;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (flags & MSG_TRUNC)
 		copied = skb->len;
diff --git a/net/ieee802154/raw.c b/net/ieee802154/raw.c
--- a/net/ieee802154/raw.c
+++ b/net/ieee802154/raw.c
@@ -191,7 +191,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (err)
 		goto done;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (flags & MSG_TRUNC)
 		copied = skb->len;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -682,7 +682,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (err)
 		goto done;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	/* Copy the address. */
 	if (sin) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -951,7 +951,7 @@ try_again:
 		UDP_INC_STATS_USER(sock_net(sk),
 				UDP_MIB_INDATAGRAMS, is_udplite);
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	/* Copy the address. */
 	if (sin) {
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -497,7 +497,7 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 			sin6->sin6_scope_id = IP6CB(skb)->iif;
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (np->rxopt.all)
 		datagram_recv_ctl(sk, msg, skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -252,7 +252,7 @@ try_again:
 					UDP_MIB_INDATAGRAMS, is_udplite);
 	}
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	/* Copy the address. */
 	if (msg->msg_name) {
diff --git a/net/key/af_key.c b/net/key/af_key.c
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3606,7 +3606,7 @@ static int pfkey_recvmsg(struct kiocb *kiocb,
 	if (err)
 		goto out_free;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	err = (flags & MSG_TRUNC) ? skb->len : copied;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -627,15 +627,14 @@ static int packet_rcv(struct sk_buff *skb, struct net_device *dev,
 
 	spin_lock(&sk->sk_receive_queue.lock);
 	po->stats.tp_packets++;
+	skb->dropcount = atomic_read(&sk->sk_drops);
 	__skb_queue_tail(&sk->sk_receive_queue, skb);
 	spin_unlock(&sk->sk_receive_queue.lock);
 	sk->sk_data_ready(sk, skb->len);
 	return 0;
 
 drop_n_acct:
-	spin_lock(&sk->sk_receive_queue.lock);
-	po->stats.tp_drops++;
-	spin_unlock(&sk->sk_receive_queue.lock);
+	po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);
 
 drop_n_restore:
 	if (skb_head != skb->data && skb_shared(skb)) {
@@ -1478,7 +1477,7 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (err)
 		goto out_free;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 
 	if (msg->msg_name)
 		memcpy(msg->msg_name, &PACKET_SKB_CB(skb)->sa,
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
--- a/net/rxrpc/ar-recvmsg.c
+++ b/net/rxrpc/ar-recvmsg.c
@@ -146,7 +146,7 @@ int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
 				memcpy(msg->msg_name,
 				       &call->conn->trans->peer->srx,
 				       sizeof(call->conn->trans->peer->srx));
-			sock_recv_timestamp(msg, &rx->sk, skb);
+			sock_recv_ts_and_drops(msg, &rx->sk, skb);
 		}
 
 		/* receive the message */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1958,7 +1958,7 @@ SCTP_STATIC int sctp_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (err)
 		goto out_free;
 
-	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_ts_and_drops(msg, sk, skb);
 	if (sctp_ulpevent_is_notification(event)) {
 		msg->msg_flags |= MSG_NOTIFICATION;
 		sp->pf->event_msgname(event, msg->msg_name, addr_len);
diff --git a/net/socket.c b/net/socket.c
--- a/net/socket.c
+++ b/net/socket.c
@@ -668,6 +668,22 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 
 EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
 
+inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
+{
+	if (sock_flag(sk, SOCK_RXQ_OVFL) && skb && skb->dropcount)
+		put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL,
+			sizeof(__u32), &skb->dropcount);
+}
+
+void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
+	struct sk_buff *skb)
+{
+	sock_recv_timestamp(msg, sk, skb);
+	sock_recv_drops(msg, sk, skb);
+	put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL, sizeof(__u32), &skb->dropcount);
+}
+EXPORT_SYMBOL_GPL(sock_recv_ts_and_drops);
+
 static inline int __sock_recvmsg(struct kiocb *iocb, struct socket *sock,
 				 struct msghdr *msg, size_t size, int flags)
 {

^ permalink raw reply related

* Re: bisect results of MSI-X related panic (help!)
From: Jesse Brandeburg @ 2009-10-10  0:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Frans Pop, Jesse Brandeburg, linux-kernel, netdev, Ingo Molnar,
	hpa
In-Reply-To: <4AAE105E.1080005@kernel.org>

On Mon, Sep 14, 2009 at 2:43 AM, Tejun Heo <tj@kernel.org> wrote:
> Tejun Heo wrote:
>> Frans Pop wrote:
>>> Jesse Brandeburg wrote:
>>>> I've bisected, here is my bisect log, problem is that the commit
>>>> identified is a merge commit, and *I don't know what to revert to test*.
>>>> It appears the parent of the merge:
>>>> 6e15cf04860074ad032e88c306bea656bbdd0f22 is marked good, but looks to be
>>>> in a possibly related area to the panic.
>>> That merge does contain quite a few merge fixups, so it's quite possible
>>> one of them is the cause of the failure.
>>> Maybe the simplest way to verify that is to compile both parents of the
>>> merge to doublecheck that they work OK. Then, if a compile of the merge
>>> itself is bad, the problem really is in the merge commit itself.
>>>
>>> That commit is the "percpu" merge, so I've added Tejun (author of most of
>>> that branch) and Ingo (merger) in CC.
>>
>> Sorry, the oops doesn't ring a bell, well, not yet at least.  It would
>> be great if the bisection can be narrowed down more.
>
> Also, building w/ debug option on, capturing more oops traces and
> pasting gdb output of l *<oops address> might shed some more light.

Okay, it has been a while and I have an update on this issue.  The
actual panic seems to have disappeared in 2.6.32-rc1(2), however, with
CONFIG_CC_STACKPROTECTOR=y, I am still panicking, the stack protector
fault shows only this message, no backtrace is listed:

Kernel stack is corrupted in: ffffffff810b5b31

I've built with a full debug kernel before this crash, so I did:

(gdb) l *0xffffffff810b5b31
0xffffffff810b5b31 is in move_native_irq (kernel/irq/migration.c:67).
62			return;
63	
64		desc->chip->mask(irq);
65		move_masked_irq(irq);
66		desc->chip->unmask(irq);
>>> 67	}
68	
(gdb) l move_native_irq
54	void move_native_irq(int irq)
55	{
56		struct irq_desc *desc = irq_to_desc(irq);
57	
58		if (likely(!(desc->status & IRQ_MOVE_PENDING)))
59			return;
60	
61		if (unlikely(desc->status & IRQ_DISABLED))
62			return;
63	
64		desc->chip->mask(irq);
65		move_masked_irq(irq);
66		desc->chip->unmask(irq);
67	}

So, this seems very related to my panic, as it is likely that
irqbalance or something else might try to move my interrupt from one
core to another and this seems likely related, and the original issue
as well as this one reproduce with LOTS of MSI-X vectors active.

- I tried connecting after the panic with kgdboc, no connection
- I tried kdump, but the same kernel I am using panics/hangs during
boot right after udev during the kexec() kernel boot (should I try
harder to get this working given it got so far?)
- I have ftrace function tracer running but no way to get at the log
post panic (wouldn't it be great if the kernel just dumped the ftrace
log on __stack_chk_fail?)

any other debugging tricks/ideas?

^ permalink raw reply

* Re: Real networking namespace
From: Stephen Hemminger @ 2009-10-10  2:08 UTC (permalink / raw)
  To: Paul Moore
  Cc: Stephen Smalley, linux-security-module, Al Viro, netdev,
	James Morris
In-Reply-To: <200910091812.16046.paul.moore@hp.com>

On Fri, 9 Oct 2009 18:12:15 -0400
Paul Moore <paul.moore@hp.com> wrote:

> On Friday 09 October 2009 12:44:52 pm Stephen Smalley wrote:
> > On Fri, 2009-10-09 at 12:37 -0400, Stephen Smalley wrote:
> > > On Fri, 2009-10-09 at 08:38 -0700, Stephen Hemminger wrote:
> > > > The existing networking namespace model is unattractive for what I
> > > > want, has anyone investigated better alternatives?
> > > >
> > > > I would like to be able to allow access to a network interface and
> > > > associated objects (routing tables etc), to be controlled by Mandatory
> > > > Access Control API's. I.e grant access to eth0 and to only certain
> > > > processes.  Some the issues with the existing models are:
> > > >   * eth0 and associated objects don't really exist in filesystem so
> > > >     not subject to LSM style control (SeLinux/SMACK/TOMOYO)
> 
> As Stephen points out, SELinux does have the ability to assign security labels 
> to network interfaces, check out the 'semanage' command.  A while back I wrote 
> up something about the SELinux network "ingress/egress" access controls:
> 
>  * http://paulmoore.livejournal.com/2128.html

I was hoping to be able to not have inaccessible interfaces visible,
is it possible to not have interfaces show up in commands like:
  ip link show
or sysfs?


-- 

^ permalink raw reply

* [PATCH] net,bonding: Add return statement in bond_create_proc_entry.
From: Rakib Mullick @ 2009-10-10  2:10 UTC (permalink / raw)
  To: Jay Vosburgh, netdev, linux-kernel, Andrew Morton; +Cc: bonding-devel

The function bond_create_proc_entry supposed to return int instead of void.
And fixes the following compilation warning.

drivers/net/bonding/bond_main.c: In function `bond_create_proc_entry':
drivers/net/bonding/bond_main.c:3393: warning: control reaches end of
non-void function

---
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>

--- linus/drivers/net/bonding/bond_main.c	2009-10-09 17:38:35.000000000 +0600
+++ rakib/drivers/net/bonding/bond_main.c	2009-10-09 17:47:46.000000000 +0600
@@ -3391,6 +3391,7 @@ static void bond_destroy_proc_dir(void)

 static int bond_create_proc_entry(struct bonding *bond)
 {
+	return 0;
 }

 static void bond_remove_proc_entry(struct bonding *bond)

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Stephen Hemminger @ 2009-10-10  2:44 UTC (permalink / raw)
  To: Matt Domsch; +Cc: netdev, linux-hotplug, Narendra_K, jordan_hargrave
In-Reply-To: <20091009210909.GA9836@auslistsprd01.us.dell.com>

On Fri, 9 Oct 2009 16:09:09 -0500
Matt Domsch <Matt_Domsch@dell.com> wrote:

> On Fri, Oct 09, 2009 at 09:00:01AM -0500, Narendra K wrote:
> > On Fri, Oct 09, 2009 at 07:12:07PM +0530, K, Narendra wrote:
> > > > example udev config:
> > > > SUBSYSTEM=="net",
> > > SYMLINK+="net/by-mac/$sysfs{ifindex}.$sysfs{address}"
> > > 
> > > work as well.  But coupling the ifindex to the MAC address like this
> > > doesn't work.  (In general, coupling any two unrelated attributes when
> > > trying to do persistent names doesn't work.)
> > > 
> > Attaching the latest patch incorporating review comments.
> 
> Same patch, rebased to linux-next.
> 
> By creating character devices for every network device, we can use
> udev to maintain alternate naming policies for devices, including
> additional names for the same device, without interfering with the
> name that the kernel assigns a device.
> 
> This is conditionalized on CONFIG_NET_CDEV.  If enabled (the default),
> device nodes will automatically be created in /dev/netdev/ for each
> network device.  (/dev/net/ is already populated by the tun device.)
> 
> These device nodes are not functional at the moment - open() returns
> -ENOSYS.  Their only purpose is to provide userspace with a kernel
> name to ifindex mapping, in a form that udev can easily manage.
> 
> Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com>
> Signed-off-by: Narendra K <Narendra_K@dell.com>
> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>

Maybe I'm dense but can't see why having a useless /dev/net/ symlinks
is a good interface choice. Perhaps you should explain the race between
PCI scan and udev in more detail, and why solving it in either of those
places won't work. As it stands you are proposing yet another wart to
the already complex set of network interface API's which has implications
for security as well as increasing the number of possible bugs.

-- 

^ permalink raw reply

* Re: [PATCH] net: Add netdev_alloc_skb_ip_align() helper
From: David Miller @ 2009-10-10  3:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: thomas, netdev, thierry.reding, nios2-dev, linux-kernel
In-Reply-To: <4ACF6562.9040109@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 9 Oct 2009 18:31:30 +0200

> David Miller a écrit :
> 
>> Looks ok, but I want to look at how often this exact sequence
>> would match.  If it applies to a lot of cases, I'll add this
>> but I know of many exceptions in my head already :-)
> 
> Well, it was more as a reference. I believe about 20-30 call sites
> could use it. Do you want me to provide a combo patch ?

No, that's not necessary.

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Matt Domsch @ 2009-10-10  4:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, linux-hotplug, Narendra_K, jordan_hargrave
In-Reply-To: <20091009194401.036da080@nehalam>

On Fri, Oct 09, 2009 at 07:44:01PM -0700, Stephen Hemminger wrote:
> Maybe I'm dense but can't see why having a useless /dev/net/ symlinks
> is a good interface choice. Perhaps you should explain the race between
> PCI scan and udev in more detail, and why solving it in either of those
> places won't work. As it stands you are proposing yet another wart to
> the already complex set of network interface API's which has implications
> for security as well as increasing the number of possible bugs.

The fundamental challenge is that system administrators, particularly
those of server-class hardware with multiple network ports present
(some on the motherboard, some on add-in cards), have the
not-so-unreasonable expectation that there is a deterministic mapping
between those ports and the name one uses to address those ports.

The fundamental roadblock to this is that enumeration != naming,
except that it is for network devices, and we keep changing the
enumeration order.

Today, port naming is completely nondeterministic.  If you have but
one NIC, there are few chances to get the name wrong (it'll be eth0).
If you have >1 NIC, chances increase to get it wrong.

The complexity arises at multiple levels.

First, device driver load order.  In the 2.4 kernel days, and even
mostly early 2.6 kernel days, the order in which network drivers
loaded played a role in determining the name of the device.  Drivers
loaded first would get their devices named first.  If I have two types
of devices, say an e100-driven NIC and a tg3-driven NIC, I could
figure out that the names would be eth0=e100 and eth1=tg3 by setting
the load order in /etc/modules.conf (now modprobe.conf).  If I wanted
the other order, fine, just switch it around in modules.conf and
reboot.  OS installers, being the first running instance of Linux,
before modprobe.conf existed to set that ordering, had to have other
mechanisms to load drivers (often manually, or if programmatically
such as in a kickstart or autoyast file, was still somewhat fixed).

With the advent of modaliases + udev, now modprobe.conf doesn't
contain this ordering anymore, and udev loads the drivers.  So while
it wasn't perfect, it was better than nothing, and that's gone now.

It gets even worse as, to speed up boot time, modprobes can be run in
parallel, and even within individual drivers, the NICs get initialized
(and named) in parallel.  Further confusing things, some devices need
firmware loaded into them before getting names assigned, which is done
from userspace, and they race.

Second, PCI device list order.  In the 2.4 kernel days, the PCI device
list was scanned "breadth-first" (for each bus; for each device; for
each function; do load...).  FWIW, Windows still does this.  It gives
BIOS, which assigns PCI bus numbers, a chance to put LOMs at a lower
bus number than add-in cards.  Module load order still mattered, but
at least if you had say 2 e1000 ports as LOMs, and 2 e1000 ports on
add-in cards, you pretty much knew the ordering would be eth0 as
lowest bdf on the motherboard, eth1 as next bdf on the motherboard,
and eth2 and 3 as the add-in cards in ascending slot order.

With the advent of PCI hot plug in the 2.5 kernel series, the
breadth-first ordering became depth-first.    (for each bus; for each
device; if the device is a bridge, scan the busses behind it.).  This
caused NICs on bus 0 device 5, and bus 1 device 3, (eth0 and 1
respectively) to be enumerated differently due to the  a bridge from
bus 0 to bus 1 at 0:4.  My crude hack of pci=bfsort, with some dmi
strings to match and auto-enable, at least reverted this back to the
ordering the 2.4 kernel and Windows used.  Now we have to keep adding
systems to this DMI list (Dell has a number of systems on this list
today; HP has even more).  And it doesn't completely solve the
problem, just masks it.

So, to address the ordering problem, I placed a constraint on our
server hardware teams, forcing them to lay out their boards and assign
PCIe lanes and bus numbers, such that at least the designed "first"
LOM would get found first in either depth-first or breadth-first
order.  Our 10G and 11G servers have this restriction in place, though
it wasn't easy.  And it's gotten even harder, as the PCIe switches
expand the number of lanes available.  We no longer have the
traditional tiered buses architecture, but the PCI layer for this
purpose thinks we do.  I need to remove this constraint on the
hardware teams - it's gotten to be impossible for the chipset lanes to
be laid out efficiently with this constraint.

All of the above just papered over the enumeration != naming problem.

Third, stateless computing is becoming more and more commonplace.  The
Field Replaceable Unit is the server itself.  Got a bad server?  Pull
it out, move the disks to an identical unit, insert the new server,
and go.  Fix the bad server offline and bring it back.  In this model,
having MAC addresses as the mechanism that is providing the
determinism (/etc/mactab or udev persistent naming rules) breaks,
because the MAC addresses of the ports on the new server won't be the
same as on the old server.  HP even has a technology to solve _this_
problem (in their blade chassis) - Virtual Connect.  The MACs get
assigned by the chassis to the blades at POST, and are fixed to the
slot.  Slick, and Dell has an even more flexible similar feature
FlexAddress.  This doesn't solve the OS installer problem of "which of
these NICs should I use to do an install?" but it does recognize the
problem space and tries to overcome it.

Fourth, for OS installers, choosing which NIC to use at installtime,
when all the NICs are plugged in, can be difficult.  PXE environments,
using pxelinux and its IPAPPEND 2 option, will append
"BOOTIF=xx:xx:xx:xx:xx:xx" to the kernel command line, that
containing the MAC address of the NIC used for PXE.  Neat trick.  Yes,
we then had to teach the OS installers to recognize and use this.  But
it only works if you PXE boot, and only for that one NIC.

Fifth, network devices can have only a single name.  eth0.  If we look
at disks, we see udev manages a tree of symlinks for
/dev/disk/by-label, /dev/disk/by-path, /dev/disk/by-uuid. And as a
system admin, if I wanted to also create a udev rule for
/dev/disk/by-function (boot, swap, mattsstorage), it's trivial to do
so.  Why can't we have this flexibility for network devices too?

So, how do we get deterministic naming for all the NICs in a system?
That's what I'm going for.  Picture a network switch, with several
blades, and several ports on each blade.  The network admin addresses
each port as say 1/16 (the 16th port on blade 1, clearly labeled).
The parallel on servers is the chassis label printed on the outside
(say, "Gb1").  But due the above, there is no guarantee, and in fact
little chance, that Gb1 will be consistently named eth0 - it may vary
from boot to boot.  That's full of fail.

For a concrete example, the 4 bnx2 chips in my PowerEdge R610 with a
current 2.6 kernel, loading only one driver, the ports get assigned
names in nondeterministic order on each boot.  Given that the
ifcfg-eth* rules, netfilter rules, and the rest all expect
deterministic naming, massive failure ensues unless some form of
determinism is brought back in.

The idea to use a character device node to expose the ifindex value,
and udev to manage a tree of symlinks to it, really follows the model
used today for disks.  It allows us to get deterministic names for
devices (albeit, the names are symlinks), and multiple names for
devices (through multiple symlink rules).  That some people want to
use the char device to call ioctl() and read/write, as is possible on
the BSDs, would just be gravy IMHO.

It does require a change in behavior for a system administrator.
Instead of hard-coding 'eth0' into her scripts, she uses
'/dev/net/by-function/boot' or somesuch.  But then that name is
guaranteed to always refer to the "right" NIC.  Every admin I've
spoken to is willing to make this kind of change, as long as they get
the consistent, deterministic naming they expect but don't have
today.  And it does require patching userspace apps to take both a
kernel device name, or a path, and to resolve the path to device name
or ifindex.  We wrote libnetdevname (really, one function), and have
patches for several userspace apps to use it, to prove it can be done.

One alternative would be to do something using the sysfs ifindex value
already exported.  e.g.
  /sys/devices/pci0000:00/0000:00:05.0/0000:05:00.0/0000:06:07.0/net/eth0/ifindex

but we have never had symlinks from /dev into /sys before (doesn't
mean we couldn't though).  In that case, udev would grow to manage
/dev/net/by-chassis-label/Embedded_NIC_1 -> /sys/devices/.../net/eth0,
and libnetdevname would be used to follow the symlink in applications.
This approach could solve my problem without (many or any?) kernel
changes needed, but wouldn't help those who want to do
ioctl/read/write to a devnode.

Given the problem, I really do need a solution.  I've proposed one
method, and an alternative, but I can't afford to let the problem stay
unaddressed any longer, and need a clear direction to be chosen.  The
char device gives me what I need, and others what they want also.

Thanks for listening to the diatribe.  For more examples and
workarounds that we've been telling our customers for several years,
check out http://linux.dell.com/papers.shtml for the Network Interface
Card Naming whitepaper.


-- 
Matt Domsch
Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

^ permalink raw reply

* Re: [PATCH] Generalize socket rx gap / receive queue overflow cmsg (v3)
From: Eric Dumazet @ 2009-10-10  4:59 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, davem, socketcan
In-Reply-To: <20091009235631.GA31501@hmsreliant.think-freely.org>

Neil Horman a écrit :
> Ok, take 3 with Erics new notes
> 
> Change Notes:
> 
> 1) Modified inlining of sock_recv_ts_and_drops to be more efficient
> 
> 2) modify getsockopt for SO_RXQ_OVFL to gurantee only a 1 or 0 return
> 
> =============================================================
> 
> 
> Create a new socket level option to report number of queue overflows
> 
> Recently I augmented the AF_PACKET protocol to report the number of frames lost
> on the socket receive queue between any two enqueued frames.  This value was
> exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
> requested that this feature be generalized so that any datagram oriented socket
> could make use of this option.  As such I've created this patch, It creates a
> new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
> SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
> overflowed between any two given frames.  It also augments the AF_PACKET
> protocol to take advantage of this new feature (as it previously did not touch
> sk->sk_drops, which this patch uses to record the overflow count).  Tested
> successfully by me.
> 
> Notes:
> 
> 1) Unlike my previous patch, this patch simply records the sk_drops value, which
> is not a number of drops between packets, but rather a total number of drops.
> Deltas must be computed in user space.
> 
> 2) While this patch currently works with datagram oriented protocols, it will
> also be accepted by non-datagram oriented protocols. I'm not sure if thats
> agreeable to everyone, but my argument in favor of doing so is that, for those
> protocols which aren't applicable to this option, sk_drops will always be zero,
> and reporting no drops on a receive queue that isn't used for those
> non-participating protocols seems reasonable to me.  This also saves us having
> to code in a per-protocol opt in mechanism.
> 
> 3) This applies cleanly to net-next assuming that commit
> 977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

I read your patch and found one error (at the very end)

Feel free to resubmit with my

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

>  
> +inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
> +{
> +	if (sock_flag(sk, SOCK_RXQ_OVFL) && skb && skb->dropcount)
> +		put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL,
> +			sizeof(__u32), &skb->dropcount);
> +}
> +
> +void sock_recv_ts_and_drops(struct msghdr *msg, struct sock *sk,
> +	struct sk_buff *skb)
> +{
> +	sock_recv_timestamp(msg, sk, skb);
> +	sock_recv_drops(msg, sk, skb);


> +	put_cmsg(msg, SOL_SOCKET, SO_RXQ_OVFL, sizeof(__u32), &skb->dropcount);
It's already part of sock_recv_drops()


> +}
> +EXPORT_SYMBOL_GPL(sock_recv_ts_and_drops);
> +


^ permalink raw reply

* Re: [PATCH] Generalize socket rx gap / receive queue overflow cmsg (v3)
From: Eric Dumazet @ 2009-10-10  5:12 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, davem, socketcan
In-Reply-To: <20091009235631.GA31501@hmsreliant.think-freely.org>

Neil Horman a écrit :
> Ok, take 3 with Erics new notes
> 
> Change Notes:
> 
> 1) Modified inlining of sock_recv_ts_and_drops to be more efficient
> 
> 2) modify getsockopt for SO_RXQ_OVFL to gurantee only a 1 or 0 return
> 
> =============================================================
> 
> 
> Create a new socket level option to report number of queue overflows
> 
> Recently I augmented the AF_PACKET protocol to report the number of frames lost
> on the socket receive queue between any two enqueued frames.  This value was
> exported via a SOL_PACKET level cmsg.  AFter I completed that work it was
> requested that this feature be generalized so that any datagram oriented socket
> could make use of this option.  As such I've created this patch, It creates a
> new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
> SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
> overflowed between any two given frames.  It also augments the AF_PACKET
> protocol to take advantage of this new feature (as it previously did not touch
> sk->sk_drops, which this patch uses to record the overflow count).  Tested
> successfully by me.
> 
> Notes:

> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -951,7 +951,7 @@ try_again:
>  		UDP_INC_STATS_USER(sock_net(sk),
>  				UDP_MIB_INDATAGRAMS, is_udplite);
>  
> -	sock_recv_timestamp(msg, sk, skb);
> +	sock_recv_ts_and_drops(msg, sk, skb);
>  
>  	/* Copy the address. */
>  	if (sin) {


As a followup to my patch about udp_poll(), I realize we dont atomic_inc(&sk->sk_drops)
in the event a packet is dropped because of a bad checksum.

I'll post a fixup once David reviewed previous patch


^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Greg KH @ 2009-10-10  5:23 UTC (permalink / raw)
  To: Matt Domsch
  Cc: Stephen Hemminger, netdev, linux-hotplug, Narendra_K,
	jordan_hargrave
In-Reply-To: <20091010044056.GA5350@mock.linuxdev.us.dell.com>

On Fri, Oct 09, 2009 at 11:40:57PM -0500, Matt Domsch wrote:
> The fundamental roadblock to this is that enumeration != naming,
> except that it is for network devices, and we keep changing the
> enumeration order.

No, the hardware changes the enumeration order, it places _no_
guarantees on what order stuff will be found in.  So this is not the
kernel changing, just to be clear.

Again, I have a machine here that likes to reorder PCI devices every 4th
or so boot times, and that's fine according to the PCI spec.  Yeah, it's
a crappy BIOS, but the manufacturer rightly pointed out that it is not
in violation of anything.

> Today, port naming is completely nondeterministic.  If you have but
> one NIC, there are few chances to get the name wrong (it'll be eth0).
> If you have >1 NIC, chances increase to get it wrong.

That is why all distros name network devices based on the only
deterministic thing they have today, the MAC address.  I still fail to
see why you do not like this solution, it is honestly the only way to
properly name network devices in a sane manner.

All distros also provide a way to easily rename the network devices, to
place a specific name on a specific MAC address, so again, this should
all be solved already.

No matter how badly your BIOS teams mess up the PCI enumeration order :)

thanks,

greg k-h

^ permalink raw reply

* Re: PATCH: Network Device Naming mechanism and policy
From: Sujit K M @ 2009-10-10  8:17 UTC (permalink / raw)
  To: Greg KH
  Cc: Matt Domsch, Stephen Hemminger, netdev, linux-hotplug, Narendra_K,
	jordan_hargrave
In-Reply-To: <20091010052308.GA12458@kroah.com>

Greg,


> No, the hardware changes the enumeration order, it places _no_
> guarantees on what order stuff will be found in.  So this is not the
> kernel changing, just to be clear.
> Again, I have a machine here that likes to reorder PCI devices every 4th
> or so boot times, and that's fine according to the PCI spec.  Yeah, it's
> a crappy BIOS, but the manufacturer rightly pointed out that it is not
> in violation of anything.
>

I think the open call should be implemented then. By the patch very little
knowledge is being shared on type of network implementation it is trying to
do.Also it is messing with core datastructure and procedures. This seems
to be simplified by changing implementing the other operations like poll().

> That is why all distros name network devices based on the only
> deterministic thing they have today, the MAC address.  I still fail to
> see why you do not like this solution, it is honestly the only way to
> properly name network devices in a sane manner.

This is feature that needs to be implemented. As per the rules followed.

>
> All distros also provide a way to easily rename the network devices, to
> place a specific name on a specific MAC address, so again, this should
> all be solved already.
>
> No matter how badly your BIOS teams mess up the PCI enumeration order :)

This is an problem, But I think this can be solved by implementing some of the
routines in the network device.

^ permalink raw reply

* Re: [PATCH 1/3] iwmc3200top: Add Intel Wireless MultiCom 3200 top driver.
From: Tomas Winkler @ 2009-10-10  9:38 UTC (permalink / raw)
  To: davem, linville, netdev, linux-wireless, linux-mmc
  Cc: yi.zhu, inaky.perez-gonzalez, cindy.h.kao, guy.cohen,
	ron.rindjunsky, Tomas Winkler
In-Reply-To: <1253662724-16497-2-git-send-email-tomas.winkler@intel.com>

On Wed, Sep 23, 2009 at 1:38 AM, Tomas Winkler <tomas.winkler@intel.com> wrote:
> This patch adds Intel Wireless MultiCom 3200 top driver.
> IWMC3200 is 4Wireless Com CHIP (GPS/BT/WiFi/WiMAX).
> Top driver is responsible for device initialization and firmware download.
> Firmware handled by top is responsible for top itself and
> as well as bluetooth and GPS coms. (Wifi and WiMax provide their own
> firmware)
> In addition top driver is used to retrieve firmware logs
> and supports other debugging features
>
> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>

Dave,
are you going to merge this one?
iwmc3200wifi and i200m-sdio which are already merged are usless without it

Thanks
Tomas

^ permalink raw reply

* Re: [BUG net-2.6] bluetooth/rfcomm : sleeping function called from invalid context at mm/slub.c:1719
From: Oliver Hartkopp @ 2009-10-10  9:54 UTC (permalink / raw)
  To: Dave Young
  Cc: Gustavo F. Padovan, Marcel Holtmann, Linux Netdev List,
	linux-bluetooth, Gustavo F. Padovan
In-Reply-To: <20091009004452.GA2395@darkstar>

Dave Young wrote:
> On Sun, Oct 04, 2009 at 06:06:35PM +0000, Gustavo F. Padovan wrote:
>> Hi all,
>>
>> * Dave Young <hidave.darkstar@gmail.com> [2009-10-04 11:26:17 +0800]:
>>
>>> I can reproduce the bug.
>>>
>>> It's probably caused by the l2cap changes by  Gustavo F. Padovan
>>> <gustavo@las.ic.unicamp.br>, I didn't see such problem after reverting
>>> Gustavo's patch series.
>> I can't reproduce the bug. I'm trying to reproduce it to figure out what of
>> my changes cause it.
>>
>> I' running
>>
>> $ dund -snu -i 00:11:67:CD:0F:CB # to pretend to be dialup/telephone
>>
>> and on the other side 
>>
>> $ rfcomm bind 0 00:11:67:CD:0F:CB 1
>> $ wvdial  # wvdial to /dev/rfcomm0
>>
>> Both sides are on the same machine. Do you see any real difference
>> between my try and the call that get the bug?
>>
> 
> Hi oliver
> 
> Could try following patch?

I did. It fixed it on the base of the latest net-2.6 tree :-)

Tested-by: Oliver Hartkopp <oliver@hartkopp.net>

Together with the previous patch from http://patchwork.kernel.org/patch/51326/
the reported bluetooth/rfcomm regressions should be fixed IMO.

Thanks for your continuous work on this, Dave!

Regards,
Oliver

>
> ---
> 
> When shutdown ppp connection, lockdep waring about non-static key
> will happen, it is caused by the lock is not initialized properly
> at that time.
> 
> Fix with tuning the lock/skb_queue_head init order
> 
> [   94.339261] INFO: trying to register non-static key.
> [   94.342509] the code is fine but needs lockdep annotation.
> [   94.342509] turning off the locking correctness validator.
> [   94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
> [   94.342509] Call Trace:
> [   94.342509]  [<c0248fbe>] register_lock_class+0x58/0x241
> [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
> [   94.342509]  [<c024ab34>] __lock_acquire+0xac/0xb73
> [   94.342509]  [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de
> [   94.342509]  [<c024b662>] lock_acquire+0x67/0x84
> [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
> [   94.342509]  [<c054a857>] _spin_lock_irqsave+0x2f/0x3f
> [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
> [   94.342509]  [<c04cd1eb>] skb_dequeue+0x15/0x41
> [   94.342509]  [<c054a648>] ? _read_unlock+0x1d/0x20
> [   94.342509]  [<c04cd641>] skb_queue_purge+0x14/0x1b
> [   94.342509]  [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap]
> [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
> [   94.342509]  [<c0249c04>] ? mark_lock+0x1e/0x1c7
> [   94.342509]  [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth]
> [   94.342509]  [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
> [   94.342509]  [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth]
> [   94.342509]  [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
> [   94.342509]  [<c02302c4>] tasklet_action+0x69/0xc1
> [   94.342509]  [<c022fbef>] __do_softirq+0x94/0x11e
> [   94.342509]  [<c022fcaf>] do_softirq+0x36/0x5a
> [   94.342509]  [<c022fe14>] irq_exit+0x35/0x68
> [   94.342509]  [<c0204ced>] do_IRQ+0x72/0x89
> [   94.342509]  [<c02038ee>] common_interrupt+0x2e/0x34
> [   94.342509]  [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d
> [   94.342509]  [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238
> [   94.342509]  [<c049d238>] cpuidle_idle_call+0x5c/0x94
> [   94.342509]  [<c02023f8>] cpu_idle+0x4e/0x6f
> [   94.342509]  [<c0534153>] rest_init+0x53/0x55
> [   94.342509]  [<c0781894>] start_kernel+0x2f0/0x2f5
> [   94.342509]  [<c0781091>] i386_start_kernel+0x91/0x96
> 
> Reported-by: Oliver Hartkopp <oliver@hartkopp.net>
> Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
> ---
> net/bluetooth/l2cap.c |    9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
> 
> --- linux-2.6.31.orig/net/bluetooth/l2cap.c	2009-10-09 08:32:46.000000000 +0800
> +++ linux-2.6.31/net/bluetooth/l2cap.c	2009-10-09 08:33:57.000000000 +0800
> @@ -555,12 +555,12 @@ static struct l2cap_conn *l2cap_conn_add
>  
>  	conn->feat_mask = 0;
>  
> -	setup_timer(&conn->info_timer, l2cap_info_timeout,
> -						(unsigned long) conn);
> -
>  	spin_lock_init(&conn->lock);
>  	rwlock_init(&conn->chan_list.lock);
>  
> +	setup_timer(&conn->info_timer, l2cap_info_timeout,
> +						(unsigned long) conn);
> +
>  	conn->disc_reason = 0x13;
>  
>  	return conn;
> @@ -783,6 +783,9 @@ static void l2cap_sock_init(struct sock 
>  	/* Default config options */
>  	pi->conf_len = 0;
>  	pi->flush_to = L2CAP_DEFAULT_FLUSH_TO;
> +	skb_queue_head_init(TX_QUEUE(sk));
> +	skb_queue_head_init(SREJ_QUEUE(sk));
> +	INIT_LIST_HEAD(SREJ_LIST(sk));
>  }
>  
>  static struct proto l2cap_proto = {


^ permalink raw reply

* Re: [BUG net-2.6] bluetooth/rfcomm : sleeping function called from invalid context at mm/slub.c:1719
From: Marcel Holtmann @ 2009-10-10 10:38 UTC (permalink / raw)
  To: Dave Young
  Cc: Gustavo F. Padovan, Oliver Hartkopp, Linux Netdev List,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, Gustavo F. Padovan
In-Reply-To: <20091009004452.GA2395@darkstar>

Hi Dave,

> > * Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> [2009-10-04 11:26:17 +0800]:
> > 
> > > 
> > > I can reproduce the bug.
> > > 
> > > It's probably caused by the l2cap changes by  Gustavo F. Padovan
> > > <gustavo-r7biftawOlEmmTcpzmvVSFAUjnlXr6A1@public.gmane.org>, I didn't see such problem after reverting
> > > Gustavo's patch series.
> > 
> > I can't reproduce the bug. I'm trying to reproduce it to figure out what of
> > my changes cause it.
> > 
> > I' running
> > 
> > $ dund -snu -i 00:11:67:CD:0F:CB # to pretend to be dialup/telephone
> > 
> > and on the other side 
> > 
> > $ rfcomm bind 0 00:11:67:CD:0F:CB 1
> > $ wvdial  # wvdial to /dev/rfcomm0
> > 
> > Both sides are on the same machine. Do you see any real difference
> > between my try and the call that get the bug?
> > 
> 
> Hi oliver
> 
> Could try following patch?
> ---
> 
> When shutdown ppp connection, lockdep waring about non-static key
> will happen, it is caused by the lock is not initialized properly
> at that time.
> 
> Fix with tuning the lock/skb_queue_head init order
> 
> [   94.339261] INFO: trying to register non-static key.
> [   94.342509] the code is fine but needs lockdep annotation.
> [   94.342509] turning off the locking correctness validator.
> [   94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
> [   94.342509] Call Trace:
> [   94.342509]  [<c0248fbe>] register_lock_class+0x58/0x241
> [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
> [   94.342509]  [<c024ab34>] __lock_acquire+0xac/0xb73
> [   94.342509]  [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de
> [   94.342509]  [<c024b662>] lock_acquire+0x67/0x84
> [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
> [   94.342509]  [<c054a857>] _spin_lock_irqsave+0x2f/0x3f
> [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
> [   94.342509]  [<c04cd1eb>] skb_dequeue+0x15/0x41
> [   94.342509]  [<c054a648>] ? _read_unlock+0x1d/0x20
> [   94.342509]  [<c04cd641>] skb_queue_purge+0x14/0x1b
> [   94.342509]  [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap]
> [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
> [   94.342509]  [<c0249c04>] ? mark_lock+0x1e/0x1c7
> [   94.342509]  [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth]
> [   94.342509]  [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
> [   94.342509]  [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth]
> [   94.342509]  [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
> [   94.342509]  [<c02302c4>] tasklet_action+0x69/0xc1
> [   94.342509]  [<c022fbef>] __do_softirq+0x94/0x11e
> [   94.342509]  [<c022fcaf>] do_softirq+0x36/0x5a
> [   94.342509]  [<c022fe14>] irq_exit+0x35/0x68
> [   94.342509]  [<c0204ced>] do_IRQ+0x72/0x89
> [   94.342509]  [<c02038ee>] common_interrupt+0x2e/0x34
> [   94.342509]  [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d
> [   94.342509]  [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238
> [   94.342509]  [<c049d238>] cpuidle_idle_call+0x5c/0x94
> [   94.342509]  [<c02023f8>] cpu_idle+0x4e/0x6f
> [   94.342509]  [<c0534153>] rest_init+0x53/0x55
> [   94.342509]  [<c0781894>] start_kernel+0x2f0/0x2f5
> [   94.342509]  [<c0781091>] i386_start_kernel+0x91/0x96
> 
> Reported-by: Oliver Hartkopp <oliver-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
> Signed-off-by: Dave Young <hidave.darkstar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

actually Gustavo send a patch titled "Initialize variables and timers
for both channel's sides" that should fix this, too. Can we test that
patch before I include this one.

Regards

Marcel

^ permalink raw reply

* Re: Ath5k data aborts
From: Nick Kossifidis @ 2009-10-10 10:56 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, ath5k-devel-xDcbHBWguxEUs3QNXV6qNA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <m3tyy8fkda.fsf-gTjgKJgtlHj77SC2UrCW1FMQynFLKtET@public.gmane.org>

2009/10/9 Krzysztof Halasa <khc@pm.waw.pl>:
> Hi,
>
> I have done a small investigation. IXP425 (ARM) in big-endian mode,
> EABI, mini-PCI atk5k wifi card, hostapd.
>
> Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01)
> Subsystem: Wistron NeWeb Corp. CM9 Wireless a/b/g MiniPCI Adapter
> 168c:0013 subsystem 185f:1012
>
>
> Results:
> Bad mode in data abort handler detected
> Internal error: Oops - bad mode: 0 [#1]
> LR is at ath5k_beacon_config+0x150/0x1d4 [ath5k]
>
> This means the PCI device didn't respond on the bus or something
> like that. Obviously the card is then unusable and the system needs to
> be restarted.
>
> Bisecting (I had to modify the procedure a bit since it only started to
> show up after other unrelated code was merged) shows the guilty commit:
> e8f055f0c3ba226ca599c14c2e5fe829f6f57cbb (ath5k: Update reset code).
>
> The problem exists with 2.6.30, 2.6.31 and current Linus' tree.
>
> Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
>
> ----------------------------------------------
> 2.6.30 appears to be fixed by:
>
> --- a/drivers/net/wireless/ath5k/reset.c
> +++ b/drivers/net/wireless/ath5k/reset.c
> @@ -476,7 +476,7 @@ static void ath5k_hw_set_sleep_clock(struct ath5k_hw *ah, bool enable)
>                (ah->ah_mac_version == (AR5K_SREV_AR2417 >> 4))) {
>                        ath5k_hw_reg_write(ah, 0x26, AR5K_PHY_SLMT);
>                        ath5k_hw_reg_write(ah, 0x0d, AR5K_PHY_SCAL);
> -                       ath5k_hw_reg_write(ah, 0x07, AR5K_PHY_SCLOCK);
> +                       ath5k_hw_reg_write(ah, 0x0C, AR5K_PHY_SCLOCK);
>                        ath5k_hw_reg_write(ah, 0x3f, AR5K_PHY_SDELAY);
>                        AR5K_REG_WRITE_BITS(ah, AR5K_PCICFG,
>                                AR5K_PCICFG_SLEEP_CLOCK_RATE, 0x02);
> @@ -490,8 +490,10 @@ static void ath5k_hw_set_sleep_clock(struct ath5k_hw *ah, bool enable)
>                }
>
>                /* Enable sleep clock operation */
> +#if 0
>                AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG,
>                                AR5K_PCICFG_SLEEP_CLOCK_EN);
> +#endif
>
>        } else {
>
>
>
> The AR5K_PHY_SCLOCK brings the old value (before the commit in question)
> back, I have no idea what is it. Leaving the new value causes the second
> run of hostapd to make the driver fail, the chip seems to not respond.
> It seems the value itself may be correct (as it works with 2.6.31+) but
> there is some additional bug fixed after 2.6.30, gitk show several
> candidate patches for this.
>
>
> Only disabling AR5K_PCICFG write makes the data abort go away.
>
> ----------------------------------------------
> 2.6.31 and Linus-current only need the AR5K_PCICFG change:
>
> --- a/drivers/net/wireless/ath/ath5k/reset.c
> +++ b/drivers/net/wireless/ath/ath5k/reset.c
> @@ -489,9 +489,10 @@ static void ath5k_hw_set_sleep_clock(struct ath5k_hw *ah, bool enable)
>                }
>
>                /* Enable sleep clock operation */
> +#if 0
>                AR5K_REG_ENABLE_BITS(ah, AR5K_PCICFG,
>                                AR5K_PCICFG_SLEEP_CLOCK_EN);
> -
> +#endif
>        } else {
>
>                /* Disable sleep clock operation and
>
>
> The question is, obviously, how to fix that for good. I can test the
> result.

Interesting, can you provide us with an eeprom info dump from your
card (using ath_info tool from madwifi svn) ?
It seems we need to skip sleep clock operation in your case.


-- 
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick
_______________________________________________
ath5k-devel mailing list
ath5k-devel@lists.ath5k.org
https://lists.ath5k.org/mailman/listinfo/ath5k-devel

^ permalink raw reply

* Re: [PATCH 1/3] iwmc3200top: Add Intel Wireless MultiCom 3200 top driver.
From: Marcel Holtmann @ 2009-10-10 11:01 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linville-2XuSBdqkA4R54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-mmc-u79uwXL29TY76Z2rM5mHXA, yi.zhu-ral2JQCrhuEAvxtiuMwx3w,
	inaky.perez-gonzalez-ral2JQCrhuEAvxtiuMwx3w,
	cindy.h.kao-ral2JQCrhuEAvxtiuMwx3w,
	guy.cohen-ral2JQCrhuEAvxtiuMwx3w,
	ron.rindjunsky-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <1253662724-16497-2-git-send-email-tomas.winkler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

Hi Tomas,

> This patch adds Intel Wireless MultiCom 3200 top driver.
> IWMC3200 is 4Wireless Com CHIP (GPS/BT/WiFi/WiMAX).
> Top driver is responsible for device initialization and firmware download.
> Firmware handled by top is responsible for top itself and
> as well as bluetooth and GPS coms. (Wifi and WiMax provide their own
> firmware)
> In addition top driver is used to retrieve firmware logs
> and supports other debugging features
> 
> Signed-off-by: Tomas Winkler <tomas.winkler-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/misc/Kconfig                   |    1 +
>  drivers/misc/Makefile                  |    1 +
>  drivers/misc/iwmc3200top/Kconfig       |   20 +
>  drivers/misc/iwmc3200top/Makefile      |   29 ++
>  drivers/misc/iwmc3200top/debugfs.c     |  145 +++++++
>  drivers/misc/iwmc3200top/debugfs.h     |   61 +++
>  drivers/misc/iwmc3200top/fw-download.c |  359 ++++++++++++++++
>  drivers/misc/iwmc3200top/fw-msg.h      |  113 +++++
>  drivers/misc/iwmc3200top/iwmc3200top.h |  202 +++++++++
>  drivers/misc/iwmc3200top/log.c         |  339 +++++++++++++++
>  drivers/misc/iwmc3200top/log.h         |  158 +++++++
>  drivers/misc/iwmc3200top/main.c        |  702 ++++++++++++++++++++++++++++++++
>  12 files changed, 2130 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/misc/iwmc3200top/Kconfig
>  create mode 100644 drivers/misc/iwmc3200top/Makefile
>  create mode 100644 drivers/misc/iwmc3200top/debugfs.c
>  create mode 100644 drivers/misc/iwmc3200top/debugfs.h
>  create mode 100644 drivers/misc/iwmc3200top/fw-download.c
>  create mode 100644 drivers/misc/iwmc3200top/fw-msg.h
>  create mode 100644 drivers/misc/iwmc3200top/iwmc3200top.h
>  create mode 100644 drivers/misc/iwmc3200top/log.c
>  create mode 100644 drivers/misc/iwmc3200top/log.h
>  create mode 100644 drivers/misc/iwmc3200top/main.c
> 
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index 68ab39d..6f85b43 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -236,5 +236,6 @@ config ISL29003
>  source "drivers/misc/c2port/Kconfig"
>  source "drivers/misc/eeprom/Kconfig"
>  source "drivers/misc/cb710/Kconfig"
> +source "drivers/misc/iwmc3200top/Kconfig"
>  
>  endif # MISC_DEVICES
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index 36f733c..97f5303 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -20,5 +20,6 @@ obj-$(CONFIG_SGI_GRU)		+= sgi-gru/
>  obj-$(CONFIG_HP_ILO)		+= hpilo.o
>  obj-$(CONFIG_ISL29003)		+= isl29003.o
>  obj-$(CONFIG_C2PORT)		+= c2port/
> +obj-$(CONFIG_IWMC3200TOP)      += iwmc3200top/
>  obj-y				+= eeprom/
>  obj-y				+= cb710/
> diff --git a/drivers/misc/iwmc3200top/Kconfig b/drivers/misc/iwmc3200top/Kconfig
> new file mode 100644
> index 0000000..9e4b88f
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/Kconfig
> @@ -0,0 +1,20 @@
> +config IWMC3200TOP
> +        tristate "Intel Wireless MultiCom Top Driver"
> +        depends on MMC && EXPERIMENTAL
> +        select FW_LOADER
> +	---help---
> +	  Intel Wireless MultiCom 3200 Top driver is responsible for
> +	  for firmware load and enabled coms enumeration
> +
> +config IWMC3200TOP_DEBUG
> +	bool "Enable full debug output of iwmc3200top Driver"
> +	depends on IWMC3200TOP
> +	---help---
> +	  Enable full debug output of iwmc3200top Driver
> +
> +config IWMC3200TOP_DEBUGFS
> +	bool "Enable Debugfs debugging interface for iwmc3200top"
> +	depends on IWMC3200TOP
> +	---help---
> +	  Enable creation of debugfs files for iwmc3200top
> +
> diff --git a/drivers/misc/iwmc3200top/Makefile b/drivers/misc/iwmc3200top/Makefile
> new file mode 100644
> index 0000000..fbf53fb
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/Makefile
> @@ -0,0 +1,29 @@
> +# iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> +# drivers/misc/iwmc3200top/Makefile
> +#
> +# Copyright (C) 2009 Intel Corporation. All rights reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License version
> +# 2 as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> +# 02110-1301, USA.
> +#
> +#
> +# Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> +#  -
> +#
> +#
> +
> +obj-$(CONFIG_IWMC3200TOP)	+= iwmc3200top.o
> +iwmc3200top-objs	:= main.o fw-download.o
> +iwmc3200top-$(CONFIG_IWMC3200TOP_DEBUG) += log.o
> +iwmc3200top-$(CONFIG_IWMC3200TOP_DEBUGFS) += debugfs.o
> diff --git a/drivers/misc/iwmc3200top/debugfs.c b/drivers/misc/iwmc3200top/debugfs.c
> new file mode 100644
> index 0000000..a531f6c
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/debugfs.c
> @@ -0,0 +1,145 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/debufs.c
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/ctype.h>
> +#include <linux/mmc/sdio_func.h>
> +#include <linux/mmc/sdio.h>
> +
> +#include "iwmc3200top.h"
> +#include "log.h"
> +#include "fw-msg.h"
> +#include "debugfs.h"
> +
> +/*      Constants definition        */
> +#define HEXADECIMAL_RADIX	16
> +
> +/*      Functions definition        */
> +
> +
> +/*      Debugfs macros       */
> +#define DEBUGFS_ADD_DIR(name, parent) do {			\
> +	dbgfs->dir_##name = debugfs_create_dir(#name, parent);	\
> +	if (!(dbgfs->dir_##name))				\
> +		goto err;					\
> +} while (0)
> +
> +#define DEBUGFS_ADD_FILE(name, parent) do {				\
> +	dbgfs->dbgfs_##parent##_files.file_##name =                     \
> +	debugfs_create_file(#name, 0644, dbgfs->dir_##parent, priv,     \
> +				&iwmct_dbgfs_##name##_ops);               \
> +	if (!(dbgfs->dbgfs_##parent##_files.file_##name))		\
> +		goto err;						\
> +} while (0)
> +
> +#define DEBUGFS_REMOVE(name)  do {	\
> +	debugfs_remove(name);		\
> +	name = NULL;			\
> +} while (0)
> +
> +#define DEBUGFS_READ_FUNC(name)						\
> +ssize_t iwmct_dbgfs_##name##_read(struct file *file,			\
> +				  char __user *user_buf,		\
> +				  size_t count, loff_t *ppos);
> +
> +#define DEBUGFS_WRITE_FUNC(name)					\
> +ssize_t iwmct_dbgfs_##name##_write(struct file *file,			\
> +				   const char __user *user_buf,		\
> +				   size_t count, loff_t *ppos);
> +
> +#define DEBUGFS_READ_FILE_OPS(name)					\
> +	DEBUGFS_READ_FUNC(name)						\
> +	static const struct file_operations iwmct_dbgfs_##name##_ops = {  \
> +		.read = iwmct_dbgfs_##name##_read,			\
> +		.open = iwmct_dbgfs_open_file_generic,			\
> +	};
> +
> +#define DEBUGFS_WRITE_FILE_OPS(name)					\
> +	DEBUGFS_WRITE_FUNC(name)					\
> +	static const struct file_operations iwmct_dbgfs_##name##_ops = {  \
> +		.write = iwmct_dbgfs_##name##_write,			\
> +		.open = iwmct_dbgfs_open_file_generic,			\
> +	};
> +
> +#define DEBUGFS_READ_WRITE_FILE_OPS(name)				\
> +	DEBUGFS_READ_FUNC(name)						\
> +	DEBUGFS_WRITE_FUNC(name)					\
> +	static const struct file_operations iwmct_dbgfs_##name##_ops = {\
> +		.write = iwmct_dbgfs_##name##_write,			\
> +		.read = iwmct_dbgfs_##name##_read,			\
> +		.open = iwmct_dbgfs_open_file_generic,			\
> +	};
> +
> +
> +/*      Debugfs file ops definitions        */
> +
> +/*      Debugfs registration functions        */
> +
> +/*
> + * Create the debugfs files and directories
> + *
> + */

so for me the whole debug infrastructure is way too much overhead for
what this driver has to achieve. What is the benefit of having all these
debugfs support. What is it trying to achieve?

Also all these macros really obfuscate the code for no real benefit. If
you wanna keep debugfs support, then just use the functions directly.
They are not that complicated that an extra abstraction layer is needed.

> +void iwmct_dbgfs_register(struct iwmct_priv *priv, const char *name)
> +{
> +	struct iwmct_debugfs *dbgfs;
> +
> +	dbgfs = kzalloc(sizeof(struct iwmct_debugfs), GFP_KERNEL);
> +	if (!dbgfs) {
> +		LOG_ERROR(priv, DEBUGFS, "failed to allocate %zd bytes\n",
> +					sizeof(struct iwmct_debugfs));
> +		goto err;
> +	}
> +
> +	priv->dbgfs = dbgfs;
> +	dbgfs->name = name;
> +	dbgfs->dir_drv = debugfs_create_dir(name, NULL);
> +	if (!dbgfs->dir_drv) {
> +		LOG_ERROR(priv, DEBUGFS, "failed to create debugfs dir\n");
> +		goto err;
> +	}
> +
> +
> +err:
> +	return;
> +}

goto err. Seriously. Just call return in the error case. And also there
is a memory leak if the debugfs_create_dir fails.

> +/**
> + * Remove the debugfs files and directories
> + *
> + */
> +void iwmct_dbgfs_unregister(struct iwmct_debugfs *dbgfs)
> +{
> +	if (!dbgfs)
> +		return;
> +
> +
> +	DEBUGFS_REMOVE(dbgfs->dir_drv);
> +	kfree(dbgfs);
> +
> +	dbgfs = NULL;
> +}
> +
> diff --git a/drivers/misc/iwmc3200top/debugfs.h b/drivers/misc/iwmc3200top/debugfs.h
> new file mode 100644
> index 0000000..24b1369
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/debugfs.h
> @@ -0,0 +1,61 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/debufs.h
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#ifndef __DEBUGFS_H__
> +#define __DEBUGFS_H__
> +
> +
> +#ifdef CONFIG_IWMC3200TOP_DEBUGFS
> +
> +#include <linux/debugfs.h>
> +
> +struct iwmct_debugfs {
> +	const char *name;
> +	struct dentry *dir_drv;
> +	struct dir_drv_files {
> +	} dbgfs_drv_files;
> +};
> +
> +void iwmct_dbgfs_register(struct iwmct_priv *priv, const char *name);
> +void iwmct_dbgfs_unregister(struct iwmct_debugfs *dbgfs);
> +
> +#else /* CONFIG_IWMC3200TOP_DEBUGFS */
> +
> +/* struct iwmct_debugfs is empty if CONFIG_IWMC3200TOP_DEBUGFS is not defined */
> +struct iwmct_debugfs {
> +};

If struct iwmct_debugfs is never referenced directly then a simple
forward struct imwct_debugfs; declaration is good enough.

> +
> +static inline void
> +iwmct_dbgfs_register(struct iwmct_priv *priv, const char *name)
> +{}
> +
> +static inline void iwmct_dbgfs_unregister(struct iwmct_debugfs *dbgfs)
> +{}
> +
> +#endif /* CONFIG_IWMC3200TOP_DEBUGFS */
> +
> +#endif /* __DEBUGFS_H__ */
> +
> diff --git a/drivers/misc/iwmc3200top/fw-download.c b/drivers/misc/iwmc3200top/fw-download.c
> new file mode 100644
> index 0000000..33cb693
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/fw-download.c
> @@ -0,0 +1,359 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/fw-download.c
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#include <linux/firmware.h>
> +#include <linux/mmc/sdio_func.h>
> +#include <asm/unaligned.h>
> +
> +#include "iwmc3200top.h"
> +#include "log.h"
> +#include "fw-msg.h"
> +
> +#define CHECKSUM_BYTES_NUM sizeof(u32)
> +
> +/**
> +  init parser struct with file
> + */
> +static int iwmct_fw_parser_init(struct iwmct_priv *priv, const u8 *file,
> +			      size_t file_size, size_t block_size)
> +{
> +	struct iwmct_parser *parser = &priv->parser;
> +	struct iwmct_fw_hdr *fw_hdr = &parser->versions;
> +
> +	LOG_INFOEX(priv, INIT, "-->\n");
> +
> +	LOG_INFO(priv, FW_DOWNLOAD, "file_size=%zd\n", file_size);

This logging is overkill. Can we just hide this via pr_debug and then
use just simply dynamic_printk support.

> +
> +	parser->file = file;
> +	parser->file_size = file_size;
> +	parser->cur_pos = 0;
> +	parser->buf = NULL;
> +
> +	parser->buf = kzalloc(block_size, GFP_KERNEL);
> +	if (!parser->buf) {
> +		LOG_ERROR(priv, FW_DOWNLOAD, "kzalloc error\n");
> +		return -ENOMEM;
> +	}
> +	parser->buf_size = block_size;
> +
> +	/* extract fw versions */
> +	memcpy(fw_hdr, parser->file, sizeof(struct iwmct_fw_hdr));
> +	LOG_INFO(priv, FW_DOWNLOAD, "fw versions are:\n"
> +		"top %u.%u.%u gps %u.%u.%u bt %u.%u.%u tic %s\n",
> +		fw_hdr->top_major, fw_hdr->top_minor, fw_hdr->top_revision,
> +		fw_hdr->gps_major, fw_hdr->gps_minor, fw_hdr->gps_revision,
> +		fw_hdr->bt_major, fw_hdr->bt_minor, fw_hdr->bt_revision,
> +		fw_hdr->tic_name);
> +
> +	parser->cur_pos += sizeof(struct iwmct_fw_hdr);
> +
> +	LOG_INFOEX(priv, INIT, "<--\n");
> +	return 0;
> +}
> +
> +static bool iwmct_checksum(struct iwmct_priv *priv)
> +{
> +	struct iwmct_parser *parser = &priv->parser;
> +	__le32 *file = (__le32 *)parser->file;
> +	int i, pad, steps;
> +	u32 accum = 0;
> +	u32 checksum;
> +	u32 mask = 0xffffffff;
> +
> +	pad = (parser->file_size - CHECKSUM_BYTES_NUM) % 4;
> +	steps =  (parser->file_size - CHECKSUM_BYTES_NUM) / 4;
> +
> +	LOG_INFO(priv, FW_DOWNLOAD, "pad=%d steps=%d\n", pad, steps);
> +
> +	for (i = 0; i < steps; i++)
> +		accum += le32_to_cpu(file[i]);
> +
> +	if (pad) {
> +		mask <<= 8 * (4 - pad);
> +		accum += le32_to_cpu(file[steps]) & mask;
> +	}
> +
> +	checksum = get_unaligned_le32((__le32 *)(parser->file +
> +			parser->file_size - CHECKSUM_BYTES_NUM));

The access to file and parser->file. One with unaligned access and the
other with, looks wrong. And it looks way to complicated.

Can we just not have a proper __le32 type of parser->file to begin with.

> +
> +	LOG_INFO(priv, FW_DOWNLOAD,
> +		"compare checksum accum=0x%x to checksum=0x%x\n",
> +		accum, checksum);
> +
> +	return checksum == accum;
> +}
> +
> +static int iwmct_parse_next_section(struct iwmct_priv *priv, const u8 **p_sec,
> +				  size_t *sec_size, __le32 *sec_addr)
> +{
> +	struct iwmct_parser *parser = &priv->parser;
> +	struct iwmct_dbg *dbg = &priv->dbg;
> +	struct iwmct_fw_sec_hdr *sec_hdr;
> +
> +	LOG_INFOEX(priv, INIT, "-->\n");
> +
> +	while (parser->cur_pos + sizeof(struct iwmct_fw_sec_hdr)
> +		<= parser->file_size) {
> +
> +		sec_hdr = (struct iwmct_fw_sec_hdr *)
> +				(parser->file + parser->cur_pos);
> +		parser->cur_pos += sizeof(struct iwmct_fw_sec_hdr);
> +
> +		LOG_INFO(priv, FW_DOWNLOAD,
> +			"sec hdr: type=%s addr=0x%x size=%d\n",
> +			sec_hdr->type, sec_hdr->target_addr,
> +			sec_hdr->data_size);
> +
> +		if (strcmp(sec_hdr->type, "ENT") == 0)
> +			parser->entry_point = le32_to_cpu(sec_hdr->target_addr);
> +		else if (strcmp(sec_hdr->type, "LBL") == 0)
> +			strcpy(dbg->label_fw, parser->file + parser->cur_pos);
> +		else if (((strcmp(sec_hdr->type, "TOP") == 0) &&
> +			  (priv->barker & BARKER_DNLOAD_TOP_MSK)) ||
> +			 ((strcmp(sec_hdr->type, "GPS") == 0) &&
> +			  (priv->barker & BARKER_DNLOAD_GPS_MSK)) ||
> +			 ((strcmp(sec_hdr->type, "BTH") == 0) &&
> +			  (priv->barker & BARKER_DNLOAD_BT_MSK))) {
> +			*sec_addr = sec_hdr->target_addr;
> +			*sec_size = le32_to_cpu(sec_hdr->data_size);
> +			*p_sec = parser->file + parser->cur_pos;
> +			parser->cur_pos += le32_to_cpu(sec_hdr->data_size);
> +			return 1;

This if statement is unreadable. This indentation gives me a headache.

Also this function better return bool and not int.

> +		} else if (strcmp(sec_hdr->type, "LOG") != 0)
> +			LOG_WARNING(priv, FW_DOWNLOAD,
> +				    "skipping section type %s\n",
> +				    sec_hdr->type);
> +
> +		parser->cur_pos += le32_to_cpu(sec_hdr->data_size);
> +		LOG_INFO(priv, FW_DOWNLOAD,
> +			"finished with section cur_pos=%zd\n", parser->cur_pos);
> +	}
> +
> +	LOG_INFOEX(priv, INIT, "<--\n");
> +	return 0;
> +}
> +
> +static int iwmct_download_section(struct iwmct_priv *priv, const u8 *p_sec,
> +				size_t sec_size, __le32 addr)
> +{
> +	struct iwmct_parser *parser = &priv->parser;
> +	struct iwmct_fw_load_hdr *hdr = (struct iwmct_fw_load_hdr *)parser->buf;
> +	const u8 *cur_block = p_sec;
> +	size_t sent = 0;
> +	int cnt = 0;
> +	int ret = 0;
> +	u32 cmd = 0;
> +
> +	LOG_INFOEX(priv, INIT, "-->\n");
> +	LOG_INFO(priv, FW_DOWNLOAD, "Download address 0x%x size 0x%zx\n",
> +				addr, sec_size);
> +
> +	while (sent < sec_size) {
> +		int i;
> +		u32 chksm = 0;
> +		u32 reset = atomic_read(&priv->reset);
> +		/* actual FW data */
> +		u32 data_size = min(parser->buf_size - sizeof(*hdr),
> +				    sec_size - sent);
> +		/* Pad to block size */
> +		u32 trans_size = (data_size + sizeof(*hdr) +
> +				  IWMC_SDIO_BLK_SIZE - 1) &
> +				  ~(IWMC_SDIO_BLK_SIZE - 1);
> +		++cnt;
> +
> +		/* in case of reset, interrupt FW DOWNLAOD */
> +		if (reset) {
> +			LOG_INFO(priv, FW_DOWNLOAD,
> +				 "Reset detected. Abort FW download!!!");
> +			ret = -ECANCELED;
> +			goto exit;
> +		}
> +
> +		memset(parser->buf, 0, parser->buf_size);
> +		cmd |= IWMC_OPCODE_WRITE << CMD_HDR_OPCODE_POS;
> +		cmd |= IWMC_CMD_SIGNATURE << CMD_HDR_SIGNATURE_POS;
> +		cmd |= (priv->dbg.direct ? 1 : 0) << CMD_HDR_DIRECT_ACCESS_POS;
> +		cmd |= (priv->dbg.checksum ? 1 : 0) << CMD_HDR_USE_CHECKSUM_POS;
> +		hdr->data_size = cpu_to_le32(data_size);
> +		hdr->target_addr = addr;
> +
> +		/* checksum is allowed for sizes divisible by 4 */
> +		if (data_size & 0x3)
> +			cmd &= ~CMD_HDR_USE_CHECKSUM_MSK;
> +
> +		memcpy(hdr->data, cur_block, data_size);
> +
> +
> +		if (cmd & CMD_HDR_USE_CHECKSUM_MSK) {
> +
> +			chksm = data_size + le32_to_cpu(addr) + cmd;
> +			for (i = 0; i < data_size >> 2; i++)
> +				chksm += ((u32 *)cur_block)[i];
> +
> +			hdr->block_chksm = cpu_to_le32(chksm);
> +			LOG_INFO(priv, FW_DOWNLOAD, "Checksum = 0x%X\n",
> +				 hdr->block_chksm);
> +		}
> +
> +		LOG_INFO(priv, FW_DOWNLOAD, "trans#%d, len=%d, sent=%zd, "
> +				"sec_size=%zd, startAddress 0x%X\n",
> +				cnt, trans_size, sent, sec_size, addr);
> +
> +		if (priv->dbg.dump)
> +			LOG_HEXDUMP(FW_DOWNLOAD, parser->buf, trans_size);
> +
> +
> +		hdr->cmd = cpu_to_le32(cmd);
> +		/* send it down */
> +		/* TODO: add more proper sending and error checking */
> +		ret = iwmct_tx(priv, 0, parser->buf, trans_size);
> +		if (ret != 0) {
> +			LOG_INFO(priv, FW_DOWNLOAD,
> +				"iwmct_tx returned %d\n", ret);
> +			goto exit;
> +		}
> +
> +		addr = cpu_to_le32(le32_to_cpu(addr) + data_size);
> +		sent += data_size;
> +		cur_block = p_sec + sent;
> +
> +		if (priv->dbg.blocks && (cnt + 1) >= priv->dbg.blocks) {
> +			LOG_INFO(priv, FW_DOWNLOAD,
> +				"Block number limit is reached [%d]\n",
> +				priv->dbg.blocks);
> +			break;
> +		}
> +	}
> +
> +	if (sent < sec_size)
> +		ret = -EINVAL;
> +exit:
> +	LOG_INFOEX(priv, INIT, "<--\n");
> +	return ret;
> +}
> +
> +static int iwmct_kick_fw(struct iwmct_priv *priv, bool jump)
> +{
> +	struct iwmct_parser *parser = &priv->parser;
> +	struct iwmct_fw_load_hdr *hdr = (struct iwmct_fw_load_hdr *)parser->buf;
> +	int ret;
> +	u32 cmd;
> +
> +	LOG_INFOEX(priv, INIT, "-->\n");
> +
> +	memset(parser->buf, 0, parser->buf_size);
> +	cmd = IWMC_CMD_SIGNATURE << CMD_HDR_SIGNATURE_POS;
> +	if (jump) {
> +		cmd |= IWMC_OPCODE_JUMP << CMD_HDR_OPCODE_POS;
> +		hdr->target_addr = cpu_to_le32(parser->entry_point);
> +		LOG_INFO(priv, FW_DOWNLOAD, "jump address 0x%x\n",
> +				parser->entry_point);
> +	} else {
> +		cmd |= IWMC_OPCODE_LAST_COMMAND << CMD_HDR_OPCODE_POS;
> +		LOG_INFO(priv, FW_DOWNLOAD, "last command\n");
> +	}
> +
> +	hdr->cmd = cpu_to_le32(cmd);
> +
> +	LOG_HEXDUMP(FW_DOWNLOAD, parser->buf, sizeof(*hdr));
> +	/* send it down */
> +	/* TODO: add more proper sending and error checking */
> +	ret = iwmct_tx(priv, 0, parser->buf, IWMC_SDIO_BLK_SIZE);
> +	if (ret)
> +		LOG_INFO(priv, FW_DOWNLOAD, "iwmct_tx returned %d", ret);
> +
> +	LOG_INFOEX(priv, INIT, "<--\n");
> +	return 0;
> +}
> +
> +int iwmct_fw_load(struct iwmct_priv *priv)
> +{
> +	const struct firmware *raw = NULL;
> +	__le32 addr;
> +	size_t len;
> +	const u8 *pdata;
> +	const u8 *name = "iwmc3200top.1.fw";
> +	int ret = 0;
> +
> +	/* clear parser struct */
> +	memset(&priv->parser, 0, sizeof(struct iwmct_parser));
> +	if (!name) {
> +		ret = -EINVAL;
> +		goto exit;
> +	}
> +
> +	/* get the firmware */
> +	ret = request_firmware(&raw, name, &priv->func->dev);
> +	if (ret < 0) {
> +		LOG_ERROR(priv, FW_DOWNLOAD, "%s request_firmware failed %d\n",
> +			  name, ret);
> +		goto exit;
> +	}
> +
> +	if (raw->size < sizeof(struct iwmct_fw_sec_hdr)) {
> +		LOG_ERROR(priv, FW_DOWNLOAD, "%s smaller then (%zd) (%zd)\n",
> +			  name, sizeof(struct iwmct_fw_sec_hdr), raw->size);
> +		goto exit;
> +	}
> +
> +	LOG_INFO(priv, FW_DOWNLOAD, "Read firmware '%s'\n", name);
> +
> +	ret = iwmct_fw_parser_init(priv, raw->data, raw->size, priv->trans_len);
> +	if (ret < 0) {
> +		LOG_ERROR(priv, FW_DOWNLOAD,
> +			  "iwmct_parser_init failed: Reason %d\n", ret);
> +		goto exit;
> +	}
> +
> +	/* checksum  */
> +	if (!iwmct_checksum(priv)) {
> +		LOG_ERROR(priv, FW_DOWNLOAD, "checksum error\n");
> +		ret = -EINVAL;
> +		goto exit;
> +	}
> +
> +	/* download firmware to device */
> +	while (iwmct_parse_next_section(priv, &pdata, &len, &addr)) {
> +		if (iwmct_download_section(priv, pdata, len, addr)) {
> +			LOG_ERROR(priv, FW_DOWNLOAD,
> +				  "%s download section failed\n", name);
> +			ret = -EIO;
> +			goto exit;
> +		}
> +	}
> +
> +	iwmct_kick_fw(priv, !!(priv->barker & BARKER_DNLOAD_JUMP_MSK));
> +
> +exit:
> +	kfree(priv->parser.buf);
> +
> +	if (raw)
> +		release_firmware(raw);
> +
> +	raw = NULL;
> +
> +	return ret;
> +}
> diff --git a/drivers/misc/iwmc3200top/fw-msg.h b/drivers/misc/iwmc3200top/fw-msg.h
> new file mode 100644
> index 0000000..9e26b75
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/fw-msg.h
> @@ -0,0 +1,113 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/fw-msg.h
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#ifndef __FWMSG_H__
> +#define __FWMSG_H__
> +
> +#define COMM_TYPE_D2H           	0xFF
> +#define COMM_TYPE_H2D           	0xEE
> +
> +#define COMM_CATEGORY_OPERATIONAL      	0x00
> +#define COMM_CATEGORY_DEBUG            	0x01
> +#define COMM_CATEGORY_TESTABILITY      	0x02
> +#define COMM_CATEGORY_DIAGNOSTICS      	0x03
> +
> +#define OP_DBG_ZSTR_MSG			cpu_to_le16(0x1A)
> +
> +#define FW_LOG_SRC_MAX			32
> +#define FW_LOG_SRC_ALL			255
> +
> +#define FW_STRING_TABLE_ADDR		cpu_to_le32(0x0C000000)
> +
> +#define CMD_DBG_LOG_LEVEL		cpu_to_le16(0x0001)
> +#define CMD_TST_DEV_RESET		cpu_to_le16(0x0060)
> +#define CMD_TST_FUNC_RESET		cpu_to_le16(0x0062)
> +#define CMD_TST_IFACE_RESET		cpu_to_le16(0x0064)
> +#define CMD_TST_CPU_UTILIZATION		cpu_to_le16(0x0065)
> +#define CMD_TST_TOP_DEEP_SLEEP		cpu_to_le16(0x0080)
> +#define CMD_TST_WAKEUP			cpu_to_le16(0x0081)
> +#define CMD_TST_FUNC_WAKEUP		cpu_to_le16(0x0082)
> +#define CMD_TST_FUNC_DEEP_SLEEP_REQUEST	cpu_to_le16(0x0083)
> +#define CMD_TST_GET_MEM_DUMP		cpu_to_le16(0x0096)
> +
> +#define OP_OPR_ALIVE			cpu_to_le16(0x0010)
> +#define OP_OPR_CMD_ACK			cpu_to_le16(0x001F)
> +#define OP_OPR_CMD_NACK			cpu_to_le16(0x0020)
> +#define OP_TST_MEM_DUMP			cpu_to_le16(0x0043)
> +
> +#define CMD_FLAG_PADDING_256		0x80
> +
> +#define FW_HCMD_BLOCK_SIZE      	256
> +
> +struct msg_hdr {
> +	u8 type;
> +	u8 category;
> +	__le16 opcode;
> +	u8 seqnum;
> +	u8 flags;
> +	__le16 length;
> +} __attribute__((__packed__));
> +
> +struct log_hdr {
> +	__le32 timestamp;
> +	u8 severity;
> +	u8 logsource;
> +	__le16 reserved;
> +} __attribute__((__packed__));
> +
> +struct mdump_hdr {
> +	u8 dmpid;
> +	u8 frag;
> +	__le16 size;
> +	__le32 addr;
> +} __attribute__((__packed__));
> +
> +struct top_msg {
> +	struct msg_hdr hdr;
> +	union {
> +		/* D2H messages */
> +		struct {
> +			struct log_hdr log_hdr;
> +			u8 data[1];
> +		} __attribute__((__packed__)) log;
> +
> +		struct {
> +			struct log_hdr log_hdr;
> +			struct mdump_hdr md_hdr;
> +			u8 data[1];
> +		} __attribute__((__packed__)) mdump;
> +
> +		/* H2D messages */
> +		struct {
> +			u8 logsource;
> +			u8 sevmask;
> +		} __attribute__((__packed__)) logdefs[FW_LOG_SRC_MAX];
> +		struct mdump_hdr mdump_req;
> +	} u;
> +} __attribute__((__packed__));
> +
> +
> +#endif /* __FWMSG_H__ */
> diff --git a/drivers/misc/iwmc3200top/iwmc3200top.h b/drivers/misc/iwmc3200top/iwmc3200top.h
> new file mode 100644
> index 0000000..59e4b7a
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/iwmc3200top.h
> @@ -0,0 +1,202 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/iwmc3200top.h
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#ifndef __IWMC3200TOP_H__
> +#define __IWMC3200TOP_H__
> +
> +#define DRV_NAME "iwmc3200top"
> +
> +#define IWMC_SDIO_BLK_SIZE			256
> +#define IWMC_DEFAULT_TR_BLK			64
> +#define IWMC_SDIO_DATA_ADDR			0x0
> +#define IWMC_SDIO_INTR_ENABLE_ADDR		0x14
> +#define IWMC_SDIO_INTR_STATUS_ADDR		0x13
> +#define IWMC_SDIO_INTR_CLEAR_ADDR		0x13
> +#define IWMC_SDIO_INTR_GET_SIZE_ADDR		0x2C
> +
> +#define COMM_HUB_HEADER_LENGTH 16
> +#define LOGGER_HEADER_LENGTH   10
> +
> +
> +#define BARKER_DNLOAD_BT_POS		0
> +#define BARKER_DNLOAD_BT_MSK		BIT(BARKER_DNLOAD_BT_POS)
> +#define BARKER_DNLOAD_GPS_POS		1
> +#define BARKER_DNLOAD_GPS_MSK		BIT(BARKER_DNLOAD_GPS_POS)
> +#define BARKER_DNLOAD_TOP_POS		2
> +#define BARKER_DNLOAD_TOP_MSK		BIT(BARKER_DNLOAD_TOP_POS)
> +#define BARKER_DNLOAD_RESERVED1_POS	3
> +#define BARKER_DNLOAD_RESERVED1_MSK	BIT(BARKER_DNLOAD_RESERVED1_POS)
> +#define BARKER_DNLOAD_JUMP_POS		4
> +#define BARKER_DNLOAD_JUMP_MSK		BIT(BARKER_DNLOAD_JUMP_POS)
> +#define BARKER_DNLOAD_SYNC_POS		5
> +#define BARKER_DNLOAD_SYNC_MSK		BIT(BARKER_DNLOAD_SYNC_POS)
> +#define BARKER_DNLOAD_RESERVED2_POS	6
> +#define BARKER_DNLOAD_RESERVED2_MSK	(0x3 << BARKER_DNLOAD_RESERVED2_POS)
> +#define BARKER_DNLOAD_BARKER_POS	8
> +#define BARKER_DNLOAD_BARKER_MSK	(0xffffff << BARKER_DNLOAD_BARKER_POS)
> +
> +#define IWMC_BARKER_REBOOT 	(0xdeadbe << BARKER_DNLOAD_BARKER_POS)
> +/* whole field barker */
> +#define IWMC_BARKER_ACK 	0xfeedbabe
> +
> +#define IWMC_CMD_SIGNATURE 	0xcbbc
> +
> +#define CMD_HDR_OPCODE_POS		0
> +#define CMD_HDR_OPCODE_MSK_MSK		(0xf << CMD_HDR_OPCODE_MSK_POS)
> +#define CMD_HDR_RESPONSE_CODE_POS	4
> +#define CMD_HDR_RESPONSE_CODE_MSK	(0xf << CMD_HDR_RESPONSE_CODE_POS)
> +#define CMD_HDR_USE_CHECKSUM_POS	8
> +#define CMD_HDR_USE_CHECKSUM_MSK	BIT(CMD_HDR_USE_CHECKSUM_POS)
> +#define CMD_HDR_RESPONSE_REQUIRED_POS	9
> +#define CMD_HDR_RESPONSE_REQUIRED_MSK	BIT(CMD_HDR_RESPONSE_REQUIRED_POS)
> +#define CMD_HDR_DIRECT_ACCESS_POS	10
> +#define CMD_HDR_DIRECT_ACCESS_MSK	BIT(CMD_HDR_DIRECT_ACCESS_POS)
> +#define CMD_HDR_RESERVED_POS		11
> +#define CMD_HDR_RESERVED_MSK		BIT(0x1f << CMD_HDR_RESERVED_POS)
> +#define CMD_HDR_SIGNATURE_POS		16
> +#define CMD_HDR_SIGNATURE_MSK		BIT(0xffff << CMD_HDR_SIGNATURE_POS)
> +
> +enum {
> +	IWMC_OPCODE_PING = 0,
> +	IWMC_OPCODE_READ = 1,
> +	IWMC_OPCODE_WRITE = 2,
> +	IWMC_OPCODE_JUMP = 3,
> +	IWMC_OPCODE_REBOOT = 4,
> +	IWMC_OPCODE_PERSISTENT_WRITE = 5,
> +	IWMC_OPCODE_PERSISTENT_READ = 6,
> +	IWMC_OPCODE_READ_MODIFY_WRITE = 7,
> +	IWMC_OPCODE_LAST_COMMAND = 15
> +};
> +
> +struct iwmct_fw_load_hdr {
> +	__le32 cmd;
> +	__le32 target_addr;
> +	__le32 data_size;
> +	__le32 block_chksm;
> +	u8 data[0];
> +};
> +
> +/**
> + * struct iwmct_fw_hdr
> + * holds all sw components versions
> + */
> +struct iwmct_fw_hdr {
> +	u8 top_major;
> +	u8 top_minor;
> +	u8 top_revision;
> +	u8 gps_major;
> +	u8 gps_minor;
> +	u8 gps_revision;
> +	u8 bt_major;
> +	u8 bt_minor;
> +	u8 bt_revision;
> +	u8 tic_name[31];
> +};
> +
> +/**
> + * struct iwmct_fw_sec_hdr
> + * @type: function type
> + * @data_size: section's data size
> + * @target_addr: download address
> + */
> +struct iwmct_fw_sec_hdr {
> +	u8 type[4];
> +	__le32 data_size;
> +	__le32 target_addr;
> +};
> +
> +/**
> + * struct iwmct_parser
> + * @file: fw image
> + * @file_size: fw size
> + * @cur_pos: position in file
> + * @buf: temp buf for download
> + * @buf_size: size of buf
> + * @entry_point: address to jump in fw kick-off
> + */
> +struct iwmct_parser {
> +	const u8 *file;
> +	size_t file_size;
> +	size_t cur_pos;
> +	u8 *buf;
> +	size_t buf_size;
> +	u32 entry_point;
> +	struct iwmct_fw_hdr versions;
> +};
> +
> +
> +struct iwmct_work_struct {
> +	struct list_head list;
> +	ssize_t iosize;
> +};
> +
> +struct iwmct_dbg {
> +	int blocks;
> +	bool dump;
> +	bool jump;
> +	bool direct;
> +	bool checksum;
> +	bool fw_download;
> +	int block_size;
> +	int download_trans_blks;
> +
> +	char label_fw[256];
> +};
> +
> +struct iwmct_priv {
> +	struct sdio_func *func;
> +	struct iwmct_debugfs *dbgfs;
> +	struct iwmct_parser parser;
> +	atomic_t reset;
> +	atomic_t dev_sync;
> +	u32 trans_len;
> +	u32 barker;
> +	struct iwmct_dbg dbg;
> +
> +	/* drivers work queue */
> +	struct workqueue_struct *wq;
> +	struct workqueue_struct *bus_rescan_wq;
> +	struct work_struct bus_rescan_worker;
> +	struct work_struct isr_worker;
> +
> +	/* drivers wait queue */
> +	wait_queue_head_t wait_q;
> +
> +	/* rx request list */
> +	struct list_head read_req_list;
> +};
> +
> +extern int iwmct_tx(struct iwmct_priv *priv, unsigned int addr,
> +		void *src, int count);
> +
> +extern int iwmct_fw_load(struct iwmct_priv *priv);
> +
> +extern void iwmct_dbg_init_params(struct iwmct_priv *drv);
> +extern void iwmct_dbg_init_drv_attrs(struct device_driver *drv);
> +extern void iwmct_dbg_remove_drv_attrs(struct device_driver *drv);
> +extern int iwmct_send_hcmd(struct iwmct_priv *priv, u8 *cmd, u16 len);
> +
> +#endif  /*  __IWMC3200TOP_H__  */
> diff --git a/drivers/misc/iwmc3200top/log.c b/drivers/misc/iwmc3200top/log.c
> new file mode 100644
> index 0000000..96b5e4a
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/log.c
> @@ -0,0 +1,339 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/log.c
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/mmc/sdio_func.h>
> +#include <linux/ctype.h>
> +#include "fw-msg.h"
> +#include "iwmc3200top.h"
> +#include "log.h"
> +
> +/* Maximal hexadecimal string size of the FW memdump message */
> +#define LOG_MSG_SIZE_MAX		12400
> +
> +/* iwmct_logdefs is a global used by log macros */
> +u8 iwmct_logdefs[LOG_SRC_MAX];
> +static u8 iwmct_fw_logdefs[FW_LOG_SRC_MAX];
> +
> +
> +static int _log_set_log_filter(u8 *logdefs, int size, u8 src, u8 logmask)
> +{
> +	int i;
> +
> +	if (src < size)
> +		logdefs[src] = logmask;
> +	else if (src == LOG_SRC_ALL)
> +		for (i = 0; i < size; i++)
> +			logdefs[i] = logmask;
> +	else
> +		return -1;
> +
> +	return 0;
> +}
> +
> +
> +int iwmct_log_set_filter(u8 src, u8 logmask)
> +{
> +	return _log_set_log_filter(iwmct_logdefs, LOG_SRC_MAX, src, logmask);
> +}
> +
> +
> +int iwmct_log_set_fw_filter(u8 src, u8 logmask)
> +{
> +	return _log_set_log_filter(iwmct_fw_logdefs,
> +				   FW_LOG_SRC_MAX, src, logmask);
> +}
> +
> +
> +static int log_msg_format_hex(char *str, int slen, u8 *ibuf,
> +			      int ilen, char *pref)
> +{
> +	int pos = 0;
> +	int i;
> +	int len;
> +
> +	for (pos = 0, i = 0; pos < slen - 2 && pref[i] != '\0'; i++, pos++)
> +		str[pos] = pref[i];
> +
> +	for (i = 0; pos < slen - 2 && i < ilen; pos += len, i++)
> +		len = snprintf(&str[pos], slen - pos - 1, " %2.2X", ibuf[i]);
> +
> +	if (i < ilen)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +/*	NOTE: This function is not thread safe.
> +	Currently it's called only from sdio rx worker - no race there
> +*/
> +void iwmct_log_top_message(struct iwmct_priv *priv, u8 *buf, int len)
> +{
> +	struct top_msg *msg;
> +	static char logbuf[LOG_MSG_SIZE_MAX];
> +
> +	msg = (struct top_msg *)buf;
> +
> +	if (len < sizeof(msg->hdr) + sizeof(msg->u.log.log_hdr)) {
> +		LOG_ERROR(priv, FW_MSG, "Log message from TOP "
> +			  "is too short %d (expected %zd)\n",
> +			  len, sizeof(msg->hdr) + sizeof(msg->u.log.log_hdr));
> +		return;
> +	}
> +
> +	if (!(iwmct_fw_logdefs[msg->u.log.log_hdr.logsource] &
> +		BIT(msg->u.log.log_hdr.severity)) ||
> +	    !(iwmct_logdefs[LOG_SRC_FW_MSG] & BIT(msg->u.log.log_hdr.severity)))
> +		return;
> +
> +	switch (msg->hdr.category) {
> +	case COMM_CATEGORY_TESTABILITY:
> +		if (!(iwmct_logdefs[LOG_SRC_TST] &
> +		      BIT(msg->u.log.log_hdr.severity)))
> +			return;
> +		if (log_msg_format_hex(logbuf, LOG_MSG_SIZE_MAX, buf,
> +				       le16_to_cpu(msg->hdr.length) +
> +				       sizeof(msg->hdr), "<TST>"))
> +			LOG_WARNING(priv, TST,
> +				  "TOP TST message is too long, truncating...");
> +		LOG_WARNING(priv, TST, "%s\n", logbuf);
> +		break;
> +	case COMM_CATEGORY_DEBUG:
> +		if (msg->hdr.opcode == OP_DBG_ZSTR_MSG)
> +			LOG_INFO(priv, FW_MSG, "%s %s", "<DBG>",
> +				       ((u8 *)msg) + sizeof(msg->hdr)
> +					+ sizeof(msg->u.log.log_hdr));
> +		else {
> +			if (log_msg_format_hex(logbuf, LOG_MSG_SIZE_MAX, buf,
> +					le16_to_cpu(msg->hdr.length)
> +						+ sizeof(msg->hdr),
> +					"<DBG>"))
> +				LOG_WARNING(priv, FW_MSG,
> +					"TOP DBG message is too long,"
> +					"truncating...");
> +			LOG_WARNING(priv, FW_MSG, "%s\n", logbuf);
> +		}
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static int _log_get_filter_str(u8 *logdefs, int logdefsz, char *buf, int size)
> +{
> +	int i, pos, len;
> +	for (i = 0, pos = 0; (pos < size-1) && (i < logdefsz); i++) {
> +		len = snprintf(&buf[pos], size - pos - 1, "0x%02X%02X,",
> +				i, logdefs[i]);
> +		pos += len;
> +	}
> +	buf[pos-1] = '\n';
> +	buf[pos] = '\0';
> +
> +	if (i < logdefsz)
> +		return -1;
> +	return 0;
> +}
> +
> +int log_get_filter_str(char *buf, int size)
> +{
> +	return _log_get_filter_str(iwmct_logdefs, LOG_SRC_MAX, buf, size);
> +}
> +
> +int log_get_fw_filter_str(char *buf, int size)
> +{
> +	return _log_get_filter_str(iwmct_fw_logdefs, FW_LOG_SRC_MAX, buf, size);
> +}
> +
> +#define HEXADECIMAL_RADIX	16
> +#define LOG_SRC_FORMAT		7 /* log level is in format of "0xXXXX," */
> +
> +ssize_t show_iwmct_log_level(struct device *d,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct iwmct_priv *priv = d->driver_data;
> +	char *str_buf;
> +	int buf_size;
> +	ssize_t ret = -EAGAIN;
> +
> +	buf_size = (LOG_SRC_FORMAT * LOG_SRC_MAX) + 1;
> +	str_buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!str_buf) {
> +		LOG_ERROR(priv, DEBUGFS,
> +			"failed to allocate %d bytes\n", buf_size);
> +		goto exit;
> +	}
> +
> +	if (log_get_filter_str(str_buf, buf_size) < 0)
> +		goto exit;
> +
> +	ret = sprintf(buf, "%s", str_buf);
> +
> +exit:
> +	kfree(str_buf);
> +	return ret;
> +}
> +
> +ssize_t store_iwmct_log_level(struct device *d,
> +			struct device_attribute *attr,
> +			const char *buf, size_t count)
> +{
> +	struct iwmct_priv *priv = d->driver_data;
> +	char *token, *str_buf = NULL;
> +	long val;
> +	u8 src, mask;
> +
> +	if (!count)
> +		goto err;
> +
> +	str_buf = kzalloc(count, GFP_KERNEL);
> +	if (!str_buf) {
> +		LOG_ERROR(priv, DEBUGFS,
> +			"failed to allocate %zd bytes\n", count);
> +		goto err;
> +	}
> +
> +	memcpy(str_buf, buf, count);
> +
> +	while ((token = strsep(&str_buf, ",")) != NULL) {
> +		while (isspace(*token))
> +			++token;
> +		if (strict_strtol(token, HEXADECIMAL_RADIX, &val)) {
> +			LOG_ERROR(priv, DEBUGFS,
> +				  "failed to convert string to long %s\n",
> +				  token);
> +			goto err;
> +		}
> +
> +		mask  = val & 0xFF;
> +		src = (val & 0XFF00) >> 8;
> +		iwmct_log_set_filter(src, mask);
> +	}
> +
> +	kfree(str_buf);
> +	return count;
> +
> +err:
> +	kfree(str_buf);
> +	return -EAGAIN;
> +}
> +
> +ssize_t show_iwmct_log_level_fw(struct device *d,
> +			struct device_attribute *attr, char *buf)
> +{
> +	struct iwmct_priv *priv = d->driver_data;
> +	char *str_buf;
> +	int buf_size;
> +	ssize_t ret = -EAGAIN;
> +
> +	buf_size = (LOG_SRC_FORMAT * FW_LOG_SRC_MAX) + 2;
> +
> +	str_buf = kzalloc(buf_size, GFP_KERNEL);
> +	if (!str_buf) {
> +		LOG_ERROR(priv, DEBUGFS,
> +			"failed to allocate %d bytes\n", buf_size);
> +		goto exit;
> +	}
> +
> +	if (log_get_fw_filter_str(str_buf, buf_size) < 0)
> +		goto exit;
> +
> +	ret = sprintf(buf, "%s", str_buf);
> +
> +exit:
> +	kfree(str_buf);
> +	return ret;
> +}
> +
> +ssize_t store_iwmct_log_level_fw(struct device *d,
> +			struct device_attribute *attr,
> +			const char *buf, size_t count)
> +{
> +	struct iwmct_priv *priv = d->driver_data;
> +	char *token, *str_buf = NULL;
> +	long val;
> +	u8 src, mask;
> +	struct top_msg cmd;
> +	u16 cmdlen = 0;
> +	int i, rc;
> +
> +	if (!count)
> +		goto err;
> +
> +	str_buf = kzalloc(count, GFP_KERNEL);
> +	if (!str_buf) {
> +		LOG_ERROR(priv, DEBUGFS,
> +			"failed to allocate %zd bytes\n", count);
> +		goto err;
> +	}
> +
> +	memcpy(str_buf, buf, count);
> +
> +	cmd.hdr.type = COMM_TYPE_H2D;
> +	cmd.hdr.category = COMM_CATEGORY_DEBUG;
> +	cmd.hdr.opcode = CMD_DBG_LOG_LEVEL;
> +
> +	for (i = 0; ((token = strsep(&str_buf, ",")) != NULL) &&
> +		     (i < FW_LOG_SRC_MAX); i++) {
> +
> +		while (isspace(*token))
> +			++token;
> +
> +		if (strict_strtol(token, HEXADECIMAL_RADIX, &val)) {
> +			LOG_ERROR(priv, DEBUGFS,
> +				  "failed to convert string to long %s\n",
> +				  token);
> +			goto err;
> +		}
> +
> +		mask  = val & 0xFF; /* LSB */
> +		src = (val & 0XFF00) >> 8; /* 2nd least significant byte. */
> +		iwmct_log_set_fw_filter(src, mask);
> +
> +		cmd.u.logdefs[i].logsource = src;
> +		cmd.u.logdefs[i].sevmask = mask;
> +	}
> +
> +	cmd.hdr.length = cpu_to_le16(i * sizeof(cmd.u.logdefs[0]));
> +	cmdlen = (i * sizeof(cmd.u.logdefs[0]) + sizeof(cmd.hdr));
> +
> +	rc = iwmct_send_hcmd(priv, (u8 *) &cmd, cmdlen);
> +	if (rc) {
> +		LOG_ERROR(priv, DEBUGFS,
> +			  "Failed to send %d bytes of fwcmd, rc=%d\n",
> +			  cmdlen, rc);
> +		goto err;
> +	} else
> +		LOG_INFO(priv, DEBUGFS, "fwcmd sent (%d bytes)\n", cmdlen);
> +
> +	kfree(str_buf);
> +	return count;
> +
> +err:
> +	kfree(str_buf);
> +	return -EAGAIN;
> +}
> +
> diff --git a/drivers/misc/iwmc3200top/log.h b/drivers/misc/iwmc3200top/log.h
> new file mode 100644
> index 0000000..aba8121
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/log.h
> @@ -0,0 +1,158 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/log.h
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#ifndef __LOG_H__
> +#define __LOG_H__
> +
> +
> +/* log severity:
> + * The log levels here match FW log levels
> + * so values need to stay as is */
> +#define LOG_SEV_CRITICAL		0
> +#define LOG_SEV_ERROR			1
> +#define LOG_SEV_WARNING			2
> +#define LOG_SEV_INFO			3
> +#define LOG_SEV_INFOEX			4
> +
> +#define LOG_SEV_FILTER_ALL		\
> +	(BIT(LOG_SEV_CRITICAL) |	\
> +	 BIT(LOG_SEV_ERROR)    |	\
> +	 BIT(LOG_SEV_WARNING)  | 	\
> +	 BIT(LOG_SEV_INFO)     |	\
> +	 BIT(LOG_SEV_INFOEX))
> +
> +/* log source */
> +#define LOG_SRC_INIT			0
> +#define LOG_SRC_DEBUGFS			1
> +#define LOG_SRC_FW_DOWNLOAD		2
> +#define LOG_SRC_FW_MSG			3
> +#define LOG_SRC_TST			4
> +#define LOG_SRC_IRQ			5
> +
> +#define	LOG_SRC_MAX			6
> +#define	LOG_SRC_ALL			0xFF
> +
> +/**
> + * Default intitialization runtime log level
> + */
> +#ifndef LOG_SEV_FILTER_RUNTIME
> +#define LOG_SEV_FILTER_RUNTIME			\
> +	(BIT(LOG_SEV_CRITICAL)	|		\
> +	 BIT(LOG_SEV_ERROR)	|		\
> +	 BIT(LOG_SEV_WARNING))
> +#endif
> +
> +#ifndef FW_LOG_SEV_FILTER_RUNTIME
> +#define FW_LOG_SEV_FILTER_RUNTIME	LOG_SEV_FILTER_ALL
> +#endif
> +
> +#ifdef CONFIG_IWMC3200TOP_DEBUG
> +/**
> + * Log macros
> + */
> +
> +#define priv2dev(priv) (&(priv->func)->dev)
> +
> +#define LOG_CRITICAL(priv, src, fmt, args...)				\
> +do {									\
> +	if (iwmct_logdefs[LOG_SRC_ ## src] & BIT(LOG_SEV_CRITICAL))	\
> +		dev_crit(priv2dev(priv), "%s %d: " fmt,			\
> +			__func__, __LINE__, ##args);			\
> +} while (0)
> +
> +#define LOG_ERROR(priv, src, fmt, args...)				\
> +do {									\
> +	if (iwmct_logdefs[LOG_SRC_ ## src] & BIT(LOG_SEV_ERROR))	\
> +		dev_err(priv2dev(priv), "%s %d: " fmt,			\
> +			__func__, __LINE__, ##args);			\
> +} while (0)
> +
> +#define LOG_WARNING(priv, src, fmt, args...)				\
> +do {									\
> +	if (iwmct_logdefs[LOG_SRC_ ## src] & BIT(LOG_SEV_WARNING))	\
> +		dev_warn(priv2dev(priv), "%s %d: " fmt,			\
> +			 __func__, __LINE__, ##args);			\
> +} while (0)
> +
> +#define LOG_INFO(priv, src, fmt, args...)				\
> +do {									\
> +	if (iwmct_logdefs[LOG_SRC_ ## src] & BIT(LOG_SEV_INFO))		\
> +		dev_info(priv2dev(priv), "%s %d: " fmt,			\
> +			 __func__, __LINE__, ##args);			\
> +} while (0)
> +
> +#define LOG_INFOEX(priv, src, fmt, args...)				\
> +do {									\
> +	if (iwmct_logdefs[LOG_SRC_ ## src] & BIT(LOG_SEV_INFOEX))	\
> +		dev_dbg(priv2dev(priv), "%s %d: " fmt,			\
> +			 __func__, __LINE__, ##args);			\
> +} while (0)
> +
> +#define LOG_HEXDUMP(src, ptr, len)					\
> +do {									\
> +	if (iwmct_logdefs[LOG_SRC_ ## src] & BIT(LOG_SEV_INFOEX))	\
> +		print_hex_dump(KERN_DEBUG, "", DUMP_PREFIX_NONE,	\
> +				16, 1, ptr, len, false);		\
> +} while (0)
> +
> +void iwmct_log_top_message(struct iwmct_priv *priv, u8 *buf, int len);
> +
> +extern u8 iwmct_logdefs[];
> +
> +int iwmct_log_set_filter(u8 src, u8 logmask);
> +int iwmct_log_set_fw_filter(u8 src, u8 logmask);
> +
> +ssize_t show_iwmct_log_level(struct device *d,
> +			struct device_attribute *attr, char *buf);
> +ssize_t store_iwmct_log_level(struct device *d,
> +			struct device_attribute *attr,
> +			const char *buf, size_t count);
> +ssize_t show_iwmct_log_level_fw(struct device *d,
> +			struct device_attribute *attr, char *buf);
> +ssize_t store_iwmct_log_level_fw(struct device *d,
> +			struct device_attribute *attr,
> +			const char *buf, size_t count);
> +
> +#else
> +
> +#define LOG_CRITICAL(priv, src, fmt, args...)
> +#define LOG_ERROR(priv, src, fmt, args...)
> +#define LOG_WARNING(priv, src, fmt, args...)
> +#define LOG_INFO(priv, src, fmt, args...)
> +#define LOG_INFOEX(priv, src, fmt, args...)
> +#define LOG_HEXDUMP(src, ptr, len)
> +
> +static inline void iwmct_log_top_message(struct iwmct_priv *priv,
> +					 u8 *buf, int len) {}
> +static inline int iwmct_log_set_filter(u8 src, u8 logmask) { return 0; }
> +static inline int iwmct_log_set_fw_filter(u8 src, u8 logmask) { return 0; }
> +
> +#endif /* CONFIG_IWMC3200TOP_DEBUG */
> +
> +int log_get_filter_str(char *buf, int size);
> +int log_get_fw_filter_str(char *buf, int size);
> +
> +#endif /* __LOG_H__ */

Seriously. Please just use dynamic_printk and lets remove all this log
infrastructure. It is way too much overhead for too little benefit.

> diff --git a/drivers/misc/iwmc3200top/main.c b/drivers/misc/iwmc3200top/main.c
> new file mode 100644
> index 0000000..09f8d06
> --- /dev/null
> +++ b/drivers/misc/iwmc3200top/main.c
> @@ -0,0 +1,702 @@
> +/*
> + * iwmc3200top - Intel Wireless MultiCom 3200 Top Driver
> + * drivers/misc/iwmc3200top/main.c
> + *
> + * Copyright (C) 2009 Intel Corporation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License version
> + * 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + *
> + *
> + * Author Name: Maxim Grabarnik <maxim.grabarnink-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> + *  -
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/mmc/sdio_ids.h>
> +#include <linux/mmc/sdio_func.h>
> +#include <linux/mmc/sdio.h>
> +
> +#include "iwmc3200top.h"
> +#include "log.h"
> +#include "fw-msg.h"
> +#include "debugfs.h"
> +
> +
> +#define DRIVER_DESCRIPTION "Intel(R) IWMC 3200 Top Driver"
> +#define DRIVER_COPYRIGHT "Copyright (c) 2008 Intel Corporation."
> +
> +#define IWMCT_VERSION "0.1.62"
> +
> +#ifdef REPOSITORY_LABEL
> +#define RL REPOSITORY_LABEL
> +#else
> +#define RL ""
> +#endif
> +
> +#ifdef CONFIG_IWMC3200TOP_DEBUG
> +#define VD "-d"
> +#else
> +#define VD
> +#endif
> +
> +#define DRIVER_VERSION IWMCT_VERSION "-"  __stringify(RL) VD
> +
> +MODULE_DESCRIPTION(DRIVER_DESCRIPTION);
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_COPYRIGHT);
> +
> +
> +#ifndef SDIO_DEVICE_ID_INTEL_IWMC3200TOP
> +#define SDIO_DEVICE_ID_INTEL_IWMC3200TOP	0x1404
> +#endif
> +
> +/*
> + * This workers main task is to wait for OP_OPR_ALIVE
> + * from TOP FW until ALIVE_MSG_TIMOUT timeout is elapsed.
> + * When OP_OPR_ALIVE received it will issue
> + * a call to "bus_rescan_devices".
> + */
> +static void iwmct_rescan_worker(struct work_struct *ws)
> +{
> +	struct iwmct_priv *priv;
> +	int ret;
> +
> +	priv = container_of(ws, struct iwmct_priv, bus_rescan_worker);
> +
> +	LOG_INFO(priv, FW_MSG, "Calling bus_rescan\n");
> +
> +	ret = bus_rescan_devices(priv->func->dev.bus);
> +	if (ret < 0)
> +		LOG_INFO(priv, FW_DOWNLOAD, "bus_rescan_devices FAILED!!!\n");
> +}
> +
> +static void op_top_message(struct iwmct_priv *priv, struct top_msg *msg)
> +{
> +	switch (msg->hdr.opcode) {
> +	case OP_OPR_ALIVE:
> +		LOG_INFO(priv, FW_MSG, "Got ALIVE from device, wake rescan\n");
> +		queue_work(priv->bus_rescan_wq, &priv->bus_rescan_worker);
> +		break;
> +	default:
> +		LOG_INFO(priv, FW_MSG, "Received msg opcode 0x%X\n",
> +			msg->hdr.opcode);
> +		break;
> +	}
> +}
> +
> +
> +static void handle_top_message(struct iwmct_priv *priv, u8 *buf, int len)
> +{
> +	struct top_msg *msg;
> +
> +	msg = (struct top_msg *)buf;
> +
> +	if (msg->hdr.type != COMM_TYPE_D2H) {
> +		LOG_ERROR(priv, FW_MSG,
> +			"Message from TOP with invalid message type 0x%X\n",
> +			msg->hdr.type);
> +		return;
> +	}
> +
> +	if (len < sizeof(msg->hdr)) {
> +		LOG_ERROR(priv, FW_MSG,
> +			"Message from TOP is too short for message header "
> +			"received %d bytes, expected at least %zd bytes\n",
> +			len, sizeof(msg->hdr));
> +		return;
> +	}
> +
> +	if (len < le16_to_cpu(msg->hdr.length) + sizeof(msg->hdr)) {
> +		LOG_ERROR(priv, FW_MSG,
> +			"Message length (%d bytes) is shorter than "
> +			"in header (%d bytes)\n",
> +			len, le16_to_cpu(msg->hdr.length));
> +		return;
> +	}
> +
> +	switch (msg->hdr.category) {
> +	case COMM_CATEGORY_OPERATIONAL:
> +		op_top_message(priv, (struct top_msg *)buf);
> +		break;
> +
> +	case COMM_CATEGORY_DEBUG:
> +	case COMM_CATEGORY_TESTABILITY:
> +	case COMM_CATEGORY_DIAGNOSTICS:
> +		iwmct_log_top_message(priv, buf, len);
> +		break;
> +
> +	default:
> +		LOG_ERROR(priv, FW_MSG,
> +			"Message from TOP with unknown category 0x%X\n",
> +			msg->hdr.category);
> +		break;
> +	}
> +}
> +
> +int iwmct_send_hcmd(struct iwmct_priv *priv, u8 *cmd, u16 len)
> +{
> +	int ret;
> +	u8 *buf;
> +
> +	LOG_INFOEX(priv, FW_MSG, "Sending hcmd:\n");
> +
> +	/* add padding to 256 for IWMC */
> +	((struct top_msg *)cmd)->hdr.flags |= CMD_FLAG_PADDING_256;
> +
> +	LOG_HEXDUMP(FW_MSG, cmd, len);
> +
> +	if (len > FW_HCMD_BLOCK_SIZE) {
> +		LOG_ERROR(priv, FW_MSG, "size %d exceeded hcmd max size %d\n",
> +			  len, FW_HCMD_BLOCK_SIZE);
> +		return -1;
> +	}
> +
> +	buf = kzalloc(FW_HCMD_BLOCK_SIZE, GFP_KERNEL);
> +	if (!buf) {
> +		LOG_ERROR(priv, FW_MSG, "kzalloc error, buf size %d\n",
> +			  FW_HCMD_BLOCK_SIZE);
> +		return -1;
> +	}
> +
> +	memcpy(buf, cmd, len);
> +
> +	sdio_claim_host(priv->func);
> +	ret = sdio_memcpy_toio(priv->func, IWMC_SDIO_DATA_ADDR, buf,
> +			       FW_HCMD_BLOCK_SIZE);
> +	sdio_release_host(priv->func);
> +
> +	kfree(buf);
> +	return ret;
> +}
> +
> +int iwmct_tx(struct iwmct_priv *priv, unsigned int addr,
> +	void *src, int count)
> +{
> +	int ret;
> +
> +	sdio_claim_host(priv->func);
> +	ret = sdio_memcpy_toio(priv->func, addr, src, count);
> +	sdio_release_host(priv->func);
> +
> +	return ret;
> +}
> +
> +static void iwmct_irq_read_worker(struct work_struct *ws)
> +{
> +	struct iwmct_priv *priv;
> +	struct iwmct_work_struct *read_req;
> +	__le32 *buf = NULL;
> +	int ret, val, i;
> +	int iosize;
> +	u32 barker;
> +
> +	priv = container_of(ws, struct iwmct_priv, isr_worker);
> +
> +	LOG_INFO(priv, IRQ, "enter iwmct_irq_read_worker %p\n", ws);
> +
> +	/* --------------------- Handshake with device -------------------- */
> +	sdio_claim_host(priv->func);
> +
> +	/* all list manipulations have to be protected by
> +	 * sdio_claim_host/sdio_release_host */
> +	if (list_empty(&priv->read_req_list)) {
> +		LOG_ERROR(priv, IRQ, "read_req_list empty in read worker\n");
> +		goto exit_release;
> +	}
> +
> +	read_req = list_entry(priv->read_req_list.next,
> +			      struct iwmct_work_struct, list);
> +
> +	list_del(&read_req->list);
> +	iosize = read_req->iosize;
> +	kfree(read_req);
> +
> +	buf = kzalloc(iosize, GFP_KERNEL);
> +	if (!buf) {
> +		LOG_ERROR(priv, IRQ, "kzalloc error, buf size %d\n", iosize);
> +		goto exit_release;
> +	}
> +
> +	LOG_INFO(priv, IRQ, "iosize=%d, buf=%p, func=%d\n",
> +				iosize, buf, priv->func->num);
> +
> +	/* read from device */
> +	ret = sdio_memcpy_fromio(priv->func, buf, IWMC_SDIO_DATA_ADDR, iosize);
> +	if (ret) {
> +		LOG_ERROR(priv, IRQ, "error %d reading buffer\n", ret);
> +		goto exit_release;
> +	}
> +
> +	LOG_HEXDUMP(IRQ, (u8 *)buf, iosize);
> +
> +	barker = le32_to_cpu(buf[0]);
> +	if (barker != IWMC_BARKER_ACK &&
> +	   (barker & BARKER_DNLOAD_BARKER_MSK) != IWMC_BARKER_REBOOT) {
> +		/* Handle Top Commhub message.
> +		 * Current handling - treat all message as
> +		 * logger messages and print the context as a string. */
> +		sdio_release_host(priv->func);
> +		handle_top_message(priv, (u8 *)buf, iosize);
> +		goto exit;
> +	}
> +
> +	/* verify basic barker sanity */
> +	for (i = 1; i < 4; i++)
> +		if (buf[i] != buf[0]) {
> +			LOG_ERROR(priv, IRQ, "Corrupted barker,"
> +					     "buf[%d]=%x buf[0]=%x\n",
> +					     i, buf[i], buf[0]);
> +			goto exit_release;
> +		}
> +
> +	val = atomic_read(&priv->dev_sync);
> +
> +	switch (val) {
> +	case 0:
> +		/* we expect to recieve REBOOT BARKER */
> +		if ((barker & BARKER_DNLOAD_BARKER_MSK) != IWMC_BARKER_REBOOT) {
> +			LOG_ERROR(priv, IRQ, "Expecting for reboot barker %x,"
> +					"but got %x\n",
> +					IWMC_BARKER_REBOOT,
> +					(barker & BARKER_DNLOAD_BARKER_MSK));
> +			goto exit_release;
> +		}
> +
> +		LOG_INFO(priv, IRQ, "Recieved reboot barker: %x\n", barker);
> +		priv->barker = barker;
> +
> +		if (barker & BARKER_DNLOAD_SYNC_MSK) {
> +			/* Send the same barker back */
> +			ret = sdio_memcpy_toio(priv->func, IWMC_SDIO_DATA_ADDR,
> +					       buf, iosize);
> +			if (ret) {
> +				LOG_ERROR(priv, IRQ,
> +					 "error %d echoing barker\n", ret);
> +				goto exit_release;
> +			}
> +			LOG_INFO(priv, IRQ, "Echoing barker to device\n");
> +			atomic_inc(&priv->dev_sync);
> +			goto exit_release;
> +		}
> +		break;
> +
> +	case 1:
> +		/* we expect to recieve ACK BARKER */
> +		if (barker != IWMC_BARKER_ACK) {
> +			LOG_INFO(priv, IRQ, "Expecting for ACK barker[%x],"
> +				 "got this instead %x\n",
> +				 IWMC_BARKER_ACK, barker);
> +			goto exit_release;
> +		}
> +		kfree(buf);
> +		atomic_set(&priv->dev_sync, 0);
> +		atomic_set(&priv->reset, 0);
> +		break;
> +
> +	default:
> +		LOG_INFO(priv, IRQ, "Wrong dev_sync %d\n", val);
> +		break;
> +	}
> +
> +	sdio_release_host(priv->func);
> +
> +
> +	LOG_INFO(priv, IRQ, "barker download request 0x%x is:\n", priv->barker);
> +	LOG_INFO(priv, IRQ, "*******  Top FW %s requested ********\n",
> +			(priv->barker & BARKER_DNLOAD_TOP_MSK) ? "was" : "not");
> +	LOG_INFO(priv, IRQ, "*******  GPS FW %s requested ********\n",
> +			(priv->barker & BARKER_DNLOAD_GPS_MSK) ? "was" : "not");
> +	LOG_INFO(priv, IRQ, "*******  BT FW %s requested ********\n",
> +			(priv->barker & BARKER_DNLOAD_BT_MSK) ? "was" : "not");
> +
> +	if (priv->dbg.fw_download)
> +		iwmct_fw_load(priv);
> +	else
> +		LOG_ERROR(priv, IRQ, "FW download not allowed\n");
> +
> +	return;
> +
> +exit_release:
> +	sdio_release_host(priv->func);
> +exit:
> +	kfree(buf);
> +	LOG_INFO(priv, IRQ, "exit iwmct_irq_read_worker\n");
> +}
> +
> +static void iwmct_irq(struct sdio_func *func)
> +{
> +	struct iwmct_priv *priv;
> +	int val, ret;
> +	int iosize;
> +	int addr = IWMC_SDIO_INTR_GET_SIZE_ADDR;
> +	struct iwmct_work_struct *read_req;
> +
> +	priv = sdio_get_drvdata(func);
> +
> +	LOG_INFO(priv, IRQ, "enter iwmct_irq\n");
> +
> +	/* read the function's status register */
> +	val = sdio_readb(func, IWMC_SDIO_INTR_STATUS_ADDR, &ret);
> +
> +	LOG_INFO(priv, IRQ, "iir value = %d, ret=%d\n", val, ret);
> +
> +	if (!val) {
> +		LOG_ERROR(priv, IRQ, "iir = 0, exiting ISR\n");
> +		goto exit_clear_intr;
> +	}
> +
> +
> +	/*
> +	 * read 2 bytes of the transaction size
> +	 * IMPORTANT: sdio transaction size has to be read before clearing
> +	 * sdio interrupt!!!
> +	 */
> +	val = sdio_readb(priv->func, addr++, &ret);
> +	iosize = val;
> +	val = sdio_readb(priv->func, addr++, &ret);
> +	iosize += val << 8;
> +
> +	LOG_INFO(priv, IRQ, "READ size %d\n", iosize);
> +
> +	if (iosize == 0) {
> +		LOG_ERROR(priv, IRQ, "READ size %d, exiting ISR\n", iosize);
> +		goto exit_clear_intr;
> +	}
> +
> +	/* allocate a work structure to pass iosize to the worker */
> +	read_req = kzalloc(sizeof(struct iwmct_work_struct), GFP_KERNEL);
> +	if (!read_req) {
> +		LOG_ERROR(priv, IRQ, "failed to allocate read_req, exit ISR\n");
> +		goto exit_clear_intr;
> +	}
> +
> +	INIT_LIST_HEAD(&read_req->list);
> +	read_req->iosize = iosize;
> +
> +	list_add_tail(&priv->read_req_list, &read_req->list);
> +
> +	/* clear the function's interrupt request bit (write 1 to clear) */
> +	sdio_writeb(func, 1, IWMC_SDIO_INTR_CLEAR_ADDR, &ret);
> +
> +	queue_work(priv->wq, &priv->isr_worker);
> +
> +	LOG_INFO(priv, IRQ, "exit iwmct_irq\n");
> +
> +	return;
> +
> +exit_clear_intr:
> +	/* clear the function's interrupt request bit (write 1 to clear) */
> +	sdio_writeb(func, 1, IWMC_SDIO_INTR_CLEAR_ADDR, &ret);
> +}
> +
> +
> +static int blocks;
> +module_param(blocks, int, 0604);
> +MODULE_PARM_DESC(blocks, "max_blocks_to_send");
> +
> +static int dump;
> +module_param(dump, bool, 0604);
> +MODULE_PARM_DESC(dump, "dump_hex_content");
> +
> +static int jump = 1;
> +module_param(jump, bool, 0604);
> +
> +static int direct = 1;
> +module_param(direct, bool, 0604);
> +
> +static int checksum = 1;
> +module_param(checksum, bool, 0604);
> +
> +static int fw_download = 1;
> +module_param(fw_download, bool, 0604);
> +
> +static int block_size = IWMC_SDIO_BLK_SIZE;
> +module_param(block_size, int, 0404);
> +
> +static int download_trans_blks = IWMC_DEFAULT_TR_BLK;
> +module_param(download_trans_blks, int, 0604);
> +
> +static int rubbish_barker;
> +module_param(rubbish_barker, bool, 0604);

Most of them are missing descriptions. And do we really need them? What
cases are they trying to work around?

> +
> +#ifdef CONFIG_IWMC3200TOP_DEBUG
> +static int log_level[LOG_SRC_MAX];
> +static unsigned int log_level_argc;
> +module_param_array(log_level, int, &log_level_argc, 0604);
> +MODULE_PARM_DESC(log_level, "log_level");
> +
> +static int log_level_fw[FW_LOG_SRC_MAX];
> +static unsigned int log_level_fw_argc;
> +module_param_array(log_level_fw, int, &log_level_fw_argc, 0604);
> +MODULE_PARM_DESC(log_level_fw, "log_level_fw");
> +#endif
> +
> +void iwmct_dbg_init_params(struct iwmct_priv *priv)
> +{
> +#ifdef CONFIG_IWMC3200TOP_DEBUG
> +	int i;
> +
> +	for (i = 0; i < log_level_argc; i++) {
> +		dev_notice(&priv->func->dev, "log_level[%d]=0x%X\n",
> +						i, log_level[i]);
> +		iwmct_log_set_filter((log_level[i] >> 8) & 0xFF,
> +			       log_level[i] & 0xFF);
> +	}
> +	for (i = 0; i < log_level_fw_argc; i++) {
> +		dev_notice(&priv->func->dev, "log_level_fw[%d]=0x%X\n",
> +						i, log_level_fw[i]);
> +		iwmct_log_set_fw_filter((log_level_fw[i] >> 8) & 0xFF,
> +				  log_level_fw[i] & 0xFF);
> +	}
> +#endif
> +
> +	priv->dbg.blocks = blocks;
> +	LOG_INFO(priv, INIT, "blocks=%d\n", blocks);
> +	priv->dbg.dump = (bool)dump;
> +	LOG_INFO(priv, INIT, "dump=%d\n", dump);
> +	priv->dbg.jump = (bool)jump;
> +	LOG_INFO(priv, INIT, "jump=%d\n", jump);
> +	priv->dbg.direct = (bool)direct;
> +	LOG_INFO(priv, INIT, "direct=%d\n", direct);
> +	priv->dbg.checksum = (bool)checksum;
> +	LOG_INFO(priv, INIT, "checksum=%d\n", checksum);
> +	priv->dbg.fw_download = (bool)fw_download;
> +	LOG_INFO(priv, INIT, "fw_download=%d\n", fw_download);
> +	priv->dbg.block_size = block_size;
> +	LOG_INFO(priv, INIT, "block_size=%d\n", block_size);
> +	priv->dbg.download_trans_blks = download_trans_blks;
> +	LOG_INFO(priv, INIT, "download_trans_blks=%d\n", download_trans_blks);
> +}
> +
> +/*****************************************************************************
> + *
> + * sysfs attributes
> + *
> + *****************************************************************************/
> +static ssize_t show_iwmct_fw_version(struct device *d,
> +				  struct device_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "%s\n",
> +		       ((struct iwmct_priv *)d->driver_data)->dbg.label_fw);
> +}
> +static DEVICE_ATTR(cc_label_fw, S_IRUGO, show_iwmct_fw_version, NULL);
> +
> +#ifdef CONFIG_IWMC3200TOP_DEBUG
> +static DEVICE_ATTR(log_level, S_IWUSR | S_IRUGO,
> +		   show_iwmct_log_level, store_iwmct_log_level);
> +static DEVICE_ATTR(log_level_fw, S_IWUSR | S_IRUGO,
> +		   show_iwmct_log_level_fw, store_iwmct_log_level_fw);
> +#endif

First module parameters and now device attributes. Why?

> +
> +static struct attribute *iwmct_sysfs_entries[] = {
> +	&dev_attr_cc_label_fw.attr,
> +#ifdef CONFIG_IWMC3200TOP_DEBUG
> +	&dev_attr_log_level.attr,
> +	&dev_attr_log_level_fw.attr,
> +#endif
> +	NULL
> +};
> +
> +static struct attribute_group iwmct_attribute_group = {
> +	.name = NULL,		/* put in device directory */
> +	.attrs = iwmct_sysfs_entries,
> +};
> +
> +
> +static int iwmct_probe(struct sdio_func *func,
> +			   const struct sdio_device_id *id)
> +{
> +	struct iwmct_priv *priv;
> +	int ret;
> +	int val = 1;
> +	int addr = IWMC_SDIO_INTR_ENABLE_ADDR;
> +
> +	dev_info(&func->dev, "enter iwmct_probe\n");
> +
> +	dev_info(&func->dev, "IRQ polling period id %u msecs, HZ is %d\n",
> +		jiffies_to_msecs(2147483647), HZ);

This is debug code. Don't use dev_info().

> +
> +	priv = kzalloc(sizeof(struct iwmct_priv), GFP_KERNEL);
> +	if (!priv) {
> +		dev_err(&func->dev, "kzalloc error\n");
> +		return -ENOMEM;
> +	}
> +	priv->func = func;
> +	sdio_set_drvdata(func, priv);
> +
> +	LOG_INFO(priv, INIT, "HIDR supported\n");
> +
> +	/* create drivers work queue */
> +	priv->wq = create_workqueue(DRV_NAME "_wq");
> +	priv->bus_rescan_wq = create_workqueue(DRV_NAME "_rescan_wq");
> +	INIT_WORK(&priv->bus_rescan_worker, iwmct_rescan_worker);
> +	INIT_WORK(&priv->isr_worker, iwmct_irq_read_worker);
> +
> +	init_waitqueue_head(&priv->wait_q);
> +
> +	sdio_claim_host(func);
> +	/* FIXME: Remove after it is fixed in the Boot ROM upgrade */
> +	func->enable_timeout = 10;
> +
> +	/* In our HW, setting the block size also wakes up the boot rom. */
> +	ret = sdio_set_block_size(func, priv->dbg.block_size);
> +	if (ret) {
> +		LOG_ERROR(priv, INIT,
> +			"sdio_set_block_size() failure: %d\n", ret);
> +		goto error_sdio_enable;
> +	}
> +
> +	ret = sdio_enable_func(func);
> +	if (ret) {
> +		LOG_ERROR(priv, INIT, "sdio_enable_func() failure: %d\n", ret);
> +		goto error_sdio_enable;
> +	}
> +
> +	/* init reset and dev_sync states */
> +	atomic_set(&priv->reset, 0);
> +	atomic_set(&priv->dev_sync, 0);
> +
> +	/* init read req queue */
> +	INIT_LIST_HEAD(&priv->read_req_list);
> +
> +	/* process configurable parameters */
> +	iwmct_dbg_init_params(priv);
> +	ret = sysfs_create_group(&func->dev.kobj, &iwmct_attribute_group);
> +	if (ret) {
> +		LOG_ERROR(priv, INIT, "Failed to register attributes and "
> +			 "initialize module_params\n");
> +		goto error_dev_attrs;
> +	}
> +
> +	iwmct_dbgfs_register(priv, DRV_NAME);
> +
> +	if (!priv->dbg.direct && priv->dbg.download_trans_blks > 8) {
> +		LOG_INFO(priv, INIT,
> +			 "Reducing transaction to 8 blocks = 2K (from %d)\n",
> +			 priv->dbg.download_trans_blks);
> +		priv->dbg.download_trans_blks = 8;
> +	}
> +	priv->trans_len = priv->dbg.download_trans_blks * priv->dbg.block_size;
> +	LOG_INFO(priv, INIT, "Transaction length = %d\n", priv->trans_len);
> +
> +	ret = sdio_claim_irq(func, iwmct_irq);
> +	if (ret) {
> +		LOG_ERROR(priv, INIT, "sdio_claim_irq() failure: %d\n", ret);
> +		goto error_claim_irq;
> +	}
> +
> +
> +	/* Enable function's interrupt */
> +	sdio_writeb(priv->func, val, addr, &ret);
> +	if (ret) {
> +		LOG_ERROR(priv, INIT, "Failure writing to "
> +			  "Interrupt Enable Register (%d): %d\n", addr, ret);
> +		goto error_enable_int;
> +	}
> +
> +	sdio_release_host(func);
> +
> +	LOG_INFO(priv, INIT, "exit iwmct_probe\n");
> +
> +	return ret;
> +
> +error_enable_int:
> +	sdio_release_irq(func);
> +error_claim_irq:
> +	sdio_disable_func(func);
> +error_dev_attrs:
> +	iwmct_dbgfs_unregister(priv->dbgfs);
> +	sysfs_remove_group(&func->dev.kobj, &iwmct_attribute_group);
> +error_sdio_enable:
> +	sdio_release_host(func);
> +	return ret;
> +}
> +
> +static void iwmct_remove(struct sdio_func *func)
> +{
> +	struct iwmct_work_struct *read_req;
> +	struct iwmct_priv *priv = sdio_get_drvdata(func);
> +
> +	priv = sdio_get_drvdata(func);
> +
> +	LOG_INFO(priv, INIT, "enter\n");
> +
> +	sdio_claim_host(func);
> +	sdio_release_irq(func);
> +	sdio_release_host(func);
> +
> +	/* Safely destroy osc workqueue */
> +	destroy_workqueue(priv->bus_rescan_wq);
> +	destroy_workqueue(priv->wq);
> +
> +	sdio_claim_host(func);
> +	sdio_disable_func(func);
> +	sysfs_remove_group(&func->dev.kobj, &iwmct_attribute_group);
> +	iwmct_dbgfs_unregister(priv->dbgfs);
> +	sdio_release_host(func);
> +
> +	/* free read requests */
> +	while (!list_empty(&priv->read_req_list)) {
> +		read_req = list_entry(priv->read_req_list.next,
> +			struct iwmct_work_struct, list);
> +
> +		list_del(&read_req->list);
> +		kfree(read_req);
> +	}
> +
> +	kfree(priv);
> +}
> +
> +
> +static const struct sdio_device_id iwmct_ids[] = {
> +	{ SDIO_DEVICE(SDIO_VENDOR_ID_INTEL, SDIO_DEVICE_ID_INTEL_IWMC3200TOP)},
> +	{ /* end: all zeroes */	},
> +};
> +
> +MODULE_DEVICE_TABLE(sdio, iwmct_ids);
> +
> +static struct sdio_driver iwmct_driver = {
> +	.probe		= iwmct_probe,
> +	.remove		= iwmct_remove,
> +	.name		= DRV_NAME,
> +	.id_table	= iwmct_ids,
> +};
> +
> +static int __init iwmct_init(void)
> +{
> +	int rc;
> +
> +	/* Default log filter settings */
> +	iwmct_log_set_filter(LOG_SRC_ALL, LOG_SEV_FILTER_RUNTIME);
> +	iwmct_log_set_filter(LOG_SRC_FW_MSG, LOG_SEV_FILTER_ALL);
> +	iwmct_log_set_fw_filter(LOG_SRC_ALL, FW_LOG_SEV_FILTER_RUNTIME);
> +
> +	rc = sdio_register_driver(&iwmct_driver);
> +
> +	return rc;
> +}

Actually return sdio_register_driver() would do the same trick.

> +
> +static void __exit iwmct_exit(void)
> +{
> +	sdio_unregister_driver(&iwmct_driver);
> +}
> +
> +module_init(iwmct_init);
> +module_exit(iwmct_exit);
> +

At some point I got lost in this code. It contains more debug statements
and infrastructure than actual code as it seems. Can you please clean
this up. I mentioned it above. Just use dynamic_printk and you should be
have something clean and simple.

Regards

Marcel


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 00/31] Swap over NFS -v20
From: Pavel Machek @ 2009-10-10 12:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Suresh Jayaraman, Linus Torvalds,
	Andrew Morton, linux-kernel, linux-mm, netdev, Neil Brown,
	Miklos Szeredi, Wouter Verhelst, trond.myklebust
In-Reply-To: <1254692482.21044.15.camel@laptop>

Hi!

> > One of them
> > would be the whole VM/net work to just make swap over nbd/iscsi safe.
> 
> Getting those two 'fixed' is going to be tons of interesting work
> because they involve interaction with userspace daemons.
> 
> NBD has fairly simple userspace, but iSCSI has a rather large userspace
> footprint and a rather complicated user/kernel interaction which will be
> mighty interesting to get allocation safe.
> 
> Ideally the swap-over-$foo bits have no userspace component.
> 
> That said, Wouter is the NBD userspace maintainer and has expressed
> interest into looking at making that work, but its sure going to be
> non-trivial, esp. since exposing PF_MEMALLOC to userspace is a, not over
> my dead-bodym like thing.

Well, as long as nbd-server is on separate machine (with real swap),
safe swapping over network should be ok, without PF_MEMALLOC for
userspace or similar nightmares, right?
								Pavel  

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox