* Re: Stop using tasklets for bottom halves
From: Steven Rostedt @ 2009-09-08 17:27 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Luis R. Rodriguez, Ingo Molnar, Michael Buesch, John W. Linville,
linux-wireless, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, Matt Smith, Kevin Hayes,
Bob Copeland, Jouni Malinen, Ivan Seskar,
ic.felix-Re5JQEeQqe8AvxtiuMwx3w, Thomas Gleixner
In-Reply-To: <20090908100144.6e06872b@nehalam>
[ added Thomas Gleixner to Cc]
On Tue, 2009-09-08 at 10:01 -0700, Stephen Hemminger wrote:
> On Tue, 08 Sep 2009 12:40:23 -0400
> Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org> wrote:
>
> > On Tue, 2009-09-08 at 09:11 -0700, Stephen Hemminger wrote:
> >
> > > > > Process context is too slow.
> > > >
> > > > Well, I'm hoping to prove the opposite. I'm working on some stuff that I
> > > > plan to present at Linux Plumbers. I've been too distracted by other
> > > > things, but hopefully I'll have some good numbers to present by then.
> > > >
> > >
> > >
> > > That's great, does it keep the good properties of NAPI (irq disabling
> > > and throttling?)
> >
> > Not exactly sure what you mean by throttling, but I'm assuming it will.
> >
> > As for irqs disabling, I'm trying to avoid doing that. Note, the device
> > will have its interrupts disabled, but not all other devices will.
> >
> > -- Steve
> >
> >
>
> The way NAPI works is that in irq routine, the device disables interrupts
> then schedules processing packets, when processing is done irq's are re-enabled.
> This means that if machine is being flooded, irq's stay off, and the packets
> get discarded (because device hardware ring is full), rather than in software
> (because software receive queue is full).
That sounds exactly like what threaded IRQs will do. When an interrupt
comes in, the device driver will disable the device interrupts, and then
the device irq thread handler is awoken.
The device irq handler will handle all the packets. If new packets come
in, and the hardware ring buffer is full, those packets will be dropped.
When the irq handler thread is done processing all pending packets, it
will re-enable the device's interrupts and go to sleep.
Yeah, looking at the NAPI code, it does seem to follow what threaded
interrupts do.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 1/1] netpoll: fix race between skb_queue_len and skb_dequeue
From: Matt Mackall @ 2009-09-08 17:27 UTC (permalink / raw)
To: DDD; +Cc: David Miller, netdev
In-Reply-To: <1252396189.16528.19.camel@dengdd-desktop>
On Tue, 2009-09-08 at 15:49 +0800, DDD wrote:
> This race will break the messages order.
>
> Sequence of events in the problem case:
>
> Assume there is one skb "A" in skb_queue and the next action of
> netpoll_send_skb() is: sending out "B" skb.
> The right order of messages should be: send A first, then send B.
>
> But as following orders, it will send B first, then send A.
I would say no, the order of messages A and B queued on different CPUs
is undefined. The only issue is that we can queue a message A on CPU0,
then causally trigger a message on CPU1 B that arrives first. But bear
in mind that the message A >>may never arrive<< because the message is
about a lockup that kills processing of delayed work.
Generally speaking, queueing should be a last ditch effort. We should
instead aim to deliver all messages immediately, even if they might be
out of order. Because out of order is better than not arriving at all.
--
http://selenic.com : development and support for Mercurial and Linux
^ permalink raw reply
* [PATCH 0/5] Adds implementation of TFRC-SP on DCCP test tree
From: Ivo Calado @ 2009-09-08 18:28 UTC (permalink / raw)
To: dccp; +Cc: netdev
Due to the problems in the previuos patch pointed by Gerrit Renker and all, I'm
resending the patches.
These patches adds implementation of TFRC-SP at the receiver side, and
are targeted at the DCCP branch
Patch #1: First Patch on TFRC-SP. Copy base files from TFRC
Patch #2: Implement loss counting on TFRC-SP receiver
Patch #3: Implement TFRC-SP calc of mean length of loss intervals
accordingly to section 3 of RFC 4828
Patch #4: Adds options DROPPED PACKETS and LOSS INTERVALS to receiver
Patch #5: Updating documentation accordingly
Following this patches, we'll be sending the sender side of TFRC-SP.
Once this code is integrated on the branch, we can proceed adding the
CCID4 code that uses this new TFRC-SP.
--
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab -
http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center -
http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br
PGP: 0x03422935
Quidquid latine dictum sit, altum viditur.
^ permalink raw reply
* [PATCH 1/5] First Patch on TFRC-SP. Copy base files from TFRC
From: Ivo Calado @ 2009-09-08 18:28 UTC (permalink / raw)
To: dccp; +Cc: netdev
First Patch on TFRC-SP. Does a copy from TFRC and adjust symbols name
with infix "_sp".
Also updates Kconfig and init/exit code. An #ifndef was added to headers
that
have commom symbols with TFRC that were not changed, so they don't get
included twice.
Following the rule #8 in Documentation/SubmittingPatches the patch is
stored at
http://embedded.ufcg.edu.br/~ivocalado/dccp/patches_tfrc_sp/tfrc_sp_receiver_01.patch
^ permalink raw reply
* [PATCH 3/5] Implement TFRC-SP calc of mean length of loss intervals, accordingly to section 3 of RFC 4828
From: Ivo Calado @ 2009-09-08 18:28 UTC (permalink / raw)
To: dccp; +Cc: netdev
Implement TFRC-SP calc of mean length of loss intervals accordingly to section 3 of RFC 4828
Changes:
- Modify tfrc_sp_lh_calc_i_mean header, now receiving the current ccval, so it can determine
if a loss interval is too recent
- Consider number of losses in each loss interval
- Only consider open loss interval if it is at least 2 rtt old
- Changes function signatures as necessary
Signed-off-by: Ivo Calado, Erivaldo Xavier, Leandro Sales <ivocalado@embedded.ufcg.edu.br>, <desadoc@gmail.com>, <leandroal@gmail.com>
Index: dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-08 10:37:16.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-08 10:42:30.000000000 -0300
@@ -67,10 +67,11 @@
}
}
-static void tfrc_sp_lh_calc_i_mean(struct tfrc_loss_hist *lh)
+static void tfrc_sp_lh_calc_i_mean(struct tfrc_loss_hist *lh, __u8 curr_ccval)
{
u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0;
int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */
+ u32 losses;
if (k <= 0)
return;
@@ -78,6 +79,15 @@
for (i = 0; i <= k; i++) {
i_i = tfrc_lh_get_interval(lh, i);
+ if (SUB16(curr_ccval,
+ tfrc_lh_get_loss_interval(lh, i)->li_ccval) <= 8) {
+
+ losses = tfrc_lh_get_loss_interval(lh, i)->li_losses;
+
+ if (losses > 0)
+ i_i = div64_u64(i_i, losses);
+ }
+
if (i < k) {
i_tot0 += i_i * tfrc_lh_weights[i];
w_tot += tfrc_lh_weights[i];
@@ -87,6 +97,11 @@
}
lh->i_mean = max(i_tot0, i_tot1) / w_tot;
+ BUG_ON(w_tot == 0);
+ if (SUB16(curr_ccval, tfrc_lh_get_loss_interval(lh, 0)->li_ccval) > 8)
+ lh->i_mean = max(i_tot0, i_tot1) / w_tot;
+ else
+ lh->i_mean = i_tot1 / w_tot;
}
/*
@@ -127,7 +142,7 @@
return;
cur->li_length = len;
- tfrc_sp_lh_calc_i_mean(lh);
+ tfrc_sp_lh_calc_i_mean(lh, dccp_hdr(skb)->dccph_ccval);
}
/* RFC 4342, 10.2: test for the existence of packet with sequence number S */
@@ -148,7 +163,8 @@
bool tfrc_sp_lh_interval_add(struct tfrc_loss_hist *lh,
struct tfrc_rx_hist *rh,
u32 (*calc_first_li)(struct sock *),
- struct sock *sk)
+ struct sock *sk,
+ __u8 ccval)
{
struct tfrc_loss_interval *cur = tfrc_lh_peek(lh);
struct tfrc_rx_hist_entry *cong_evt;
@@ -217,7 +233,7 @@
if (lh->counter > (2*LIH_SIZE))
lh->counter -= LIH_SIZE;
- tfrc_sp_lh_calc_i_mean(lh);
+ tfrc_sp_lh_calc_i_mean(lh, ccval);
}
return true;
Index: dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-08 10:37:16.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-08 10:42:30.000000000 -0300
@@ -73,7 +73,8 @@
extern bool tfrc_sp_lh_interval_add(struct tfrc_loss_hist *,
struct tfrc_rx_hist *,
u32 (*first_li)(struct sock *),
- struct sock *);
+ struct sock *,
+ __u8 ccval);
extern void tfrc_sp_lh_update_i_mean(struct tfrc_loss_hist *lh,
struct sk_buff *);
extern void tfrc_sp_lh_cleanup(struct tfrc_loss_hist *lh);
Index: dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/packet_history_sp.c 2009-09-08 10:37:16.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.c 2009-09-08 10:42:30.000000000 -0300
@@ -369,7 +369,8 @@
/*
* Update Loss Interval database and recycle RX records
*/
- new_event = tfrc_sp_lh_interval_add(lh, h, first_li, sk);
+ new_event = tfrc_sp_lh_interval_add(lh, h, first_li, sk,
+ dccp_hdr(skb)->dccph_ccval);
__three_after_loss(h);
} else if (dccp_data_packet(skb) && dccp_skb_is_ecn_ce(skb)) {
@@ -378,7 +379,8 @@
* the RFC considers ECN marks - a future implementation may
* find it useful to also check ECN marks on non-data packets.
*/
- new_event = tfrc_sp_lh_interval_add(lh, h, first_li, sk);
+ new_event = tfrc_sp_lh_interval_add(lh, h, first_li, sk,
+ dccp_hdr(skb)->dccph_ccval);
/*
* Also combinations of loss and ECN-marks (as per the warning)
* are not supported. The permutations of loss combined with or
^ permalink raw reply
* [PATCH 2/5] Implement loss counting on TFRC-SP receiver
From: Ivo Calado @ 2009-09-08 18:28 UTC (permalink / raw)
To: dccp; +Cc: netdev
Implement loss counting on TFRC-SP receiver. Consider transmission's hole size as loss count.
Changes:
- Adds field li_losses to tfrc_loss_interval to track loss count per interval
- Adds field num_losses to tfrc_rx_hist, used to store loss count per loss event
- Adds dccp_loss_count function to net/dccp/dccp.h, responsible for loss count using sequence numbers
Signed-off-by: Ivo Calado, Erivaldo Xavier, Leandro Sales <ivocalado@embedded.ufcg.edu.br>, <desadoc@gmail.com>, <leandroal@gmail.com>
Index: dccp_tree_work4/net/dccp/ccids/lib/loss_interval_sp.c
===================================================================
--- dccp_tree_work4.orig/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-03 22:58:17.000000000 -0300
+++ dccp_tree_work4/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-03 23:00:24.000000000 -0300
@@ -187,6 +187,7 @@
s64 len = dccp_delta_seqno(cur->li_seqno, cong_evt_seqno);
if ((len <= 0) ||
(!tfrc_lh_closed_check(cur, cong_evt->tfrchrx_ccval))) {
+ cur->li_losses += rh->num_losses;
return false;
}
@@ -204,6 +205,7 @@
cur->li_seqno = cong_evt_seqno;
cur->li_ccval = cong_evt->tfrchrx_ccval;
cur->li_is_closed = false;
+ cur->li_losses = rh->num_losses;
if (++lh->counter == 1)
lh->i_mean = cur->li_length = (*calc_first_li)(sk);
Index: dccp_tree_work4/net/dccp/ccids/lib/loss_interval_sp.h
===================================================================
--- dccp_tree_work4.orig/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-03 22:58:17.000000000 -0300
+++ dccp_tree_work4/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-03 23:00:24.000000000 -0300
@@ -30,12 +30,14 @@
* @li_ccval: The CCVal belonging to @li_seqno
* @li_is_closed: Whether @li_seqno is older than 1 RTT
* @li_length: Loss interval sequence length
+ * @li_losses: Number of losses counted on this interval
*/
struct tfrc_loss_interval {
u64 li_seqno:48,
li_ccval:4,
li_is_closed:1;
u32 li_length;
+ u32 li_losses;
};
/*
Index: dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.c
===================================================================
--- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.c 2009-09-03 22:58:17.000000000 -0300
+++ dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.c 2009-09-03 23:00:24.000000000 -0300
@@ -244,6 +244,7 @@
h->loss_count = 3;
tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 3),
skb, n3);
+ h->num_losses = dccp_loss_count(s2, s3, n3);
return 1;
}
@@ -257,6 +258,7 @@
tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 2),
skb, n3);
h->loss_count = 3;
+ h->num_losses = dccp_loss_count(s1, s3, n3);
return 1;
}
@@ -293,6 +295,7 @@
h->loss_start = tfrc_rx_hist_index(h, 3);
tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 1), skb, n3);
h->loss_count = 3;
+ h->num_losses = dccp_loss_count(s0, s3, n3);
return 1;
}
Index: dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.h
===================================================================
--- dccp_tree_work4.orig/net/dccp/ccids/lib/packet_history_sp.h 2009-09-03 22:58:17.000000000 -0300
+++ dccp_tree_work4/net/dccp/ccids/lib/packet_history_sp.h 2009-09-03 22:58:29.000000000 -0300
@@ -113,6 +113,7 @@
u32 packet_size,
bytes_recvd;
ktime_t bytes_start;
+ u8 num_losses;
};
/*
Index: dccp_tree_work4/net/dccp/dccp.h
===================================================================
--- dccp_tree_work4.orig/net/dccp/dccp.h 2009-09-03 22:58:17.000000000 -0300
+++ dccp_tree_work4/net/dccp/dccp.h 2009-09-03 22:58:29.000000000 -0300
@@ -168,6 +168,21 @@
return (u64)delta <= ndp + 1;
}
+static inline u64 dccp_loss_count(const u64 s1, const u64 s2, const u64 ndp)
+{
+ s64 delta, count;
+
+ delta = dccp_delta_seqno(s1, s2);
+ WARN_ON(delta < 0);
+
+ count = ndp + 1;
+ count -= delta;
+
+ count = (count > 0) ? count : 0;
+
+ return (u64) count;
+}
+
enum {
DCCP_MIB_NUM = 0,
DCCP_MIB_ACTIVEOPENS, /* ActiveOpens */
^ permalink raw reply
* [PATCH 4/5] Adds options DROPPED PACKETS and LOSS INTERVALS to receiver
From: Ivo Calado @ 2009-09-08 18:28 UTC (permalink / raw)
To: dccp; +Cc: netdev
Adds options DROPPED PACKETS and LOSS INTERVALS to receiver. In this patch is added the
mechanism of gathering information about loss intervals and storing it, for later
construction of these two options.
Changes:
- Adds tfrc_loss_data and tfrc_loss_data_entry, structures that register loss intervals info
- Adds dccp_skb_is_ecn_ect0 and dccp_skb_is_ecn_ect1 as necessary, so ecn can be verified and
used in loss intervals option, that reports ecn nonce sum
- Adds tfrc_sp_update_li_data that updates information about loss intervals
- Adds tfrc_sp_ld_prepare_data, that fills fields on tfrc_loss_data with current options values
- And adds a field of type struct tfrc_loss_data to struct tfrc_hc_rx_sock
Signed-off-by: Ivo Calado, Erivaldo Xavier, Leandro Sales <ivocalado@embedded.ufcg.edu.br>, <desadoc@gmail.com>, <leandroal@gmail.com>
Index: dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/packet_history_sp.c 2009-09-08 10:42:30.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.c 2009-09-08 10:42:37.000000000 -0300
@@ -233,7 +233,9 @@
}
/* return 1 if a new loss event has been identified */
-static int __two_after_loss(struct tfrc_rx_hist *h, struct sk_buff *skb, u32 n3)
+static int __two_after_loss(struct tfrc_rx_hist *h,
+ struct sk_buff *skb, u32 n3,
+ bool *new_loss)
{
u64 s0 = tfrc_rx_hist_loss_prev(h)->tfrchrx_seqno,
s1 = tfrc_rx_hist_entry(h, 1)->tfrchrx_seqno,
@@ -245,6 +247,7 @@
tfrc_sp_rx_hist_entry_from_skb(tfrc_rx_hist_entry(h, 3),
skb, n3);
h->num_losses = dccp_loss_count(s2, s3, n3);
+ *new_loss = 1;
return 1;
}
@@ -259,6 +262,7 @@
skb, n3);
h->loss_count = 3;
h->num_losses = dccp_loss_count(s1, s3, n3);
+ *new_loss = 1;
return 1;
}
@@ -284,6 +288,7 @@
tfrc_sp_rx_hist_entry_from_skb(
tfrc_rx_hist_loss_prev(h), skb, n3);
+ *new_loss = 0;
return 0;
}
@@ -297,6 +302,7 @@
h->loss_count = 3;
h->num_losses = dccp_loss_count(s0, s3, n3);
+ *new_loss = 1;
return 1;
}
@@ -348,11 +354,14 @@
* operations when loss_count is greater than 0 after calling this function.
*/
bool tfrc_sp_rx_congestion_event(struct tfrc_rx_hist *h,
- struct tfrc_loss_hist *lh,
- struct sk_buff *skb, const u64 ndp,
- u32 (*first_li)(struct sock *), struct sock *sk)
+ struct tfrc_loss_hist *lh,
+ struct tfrc_loss_data *ld,
+ struct sk_buff *skb, const u64 ndp,
+ u32 (*first_li)(struct sock *),
+ struct sock *sk)
{
bool new_event = false;
+ bool new_loss = false;
if (tfrc_sp_rx_hist_duplicate(h, skb))
return 0;
@@ -365,12 +374,13 @@
__one_after_loss(h, skb, ndp);
} else if (h->loss_count != 2) {
DCCP_BUG("invalid loss_count %d", h->loss_count);
- } else if (__two_after_loss(h, skb, ndp)) {
+ } else if (__two_after_loss(h, skb, ndp, &new_loss)) {
/*
* Update Loss Interval database and recycle RX records
*/
new_event = tfrc_sp_lh_interval_add(lh, h, first_li, sk,
dccp_hdr(skb)->dccph_ccval);
+ tfrc_sp_update_li_data(ld, h, skb, new_loss, new_event);
__three_after_loss(h);
} else if (dccp_data_packet(skb) && dccp_skb_is_ecn_ce(skb)) {
@@ -396,6 +406,8 @@
}
}
+ tfrc_sp_update_li_data(ld, h, skb, new_loss, new_event);
+
/*
* Update moving-average of `s' and the sum of received payload bytes.
*/
Index: dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-08 10:42:30.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-08 10:42:37.000000000 -0300
@@ -14,6 +14,7 @@
#include "tfrc_sp.h"
static struct kmem_cache *tfrc_lh_slab __read_mostly;
+static struct kmem_cache *tfrc_ld_slab __read_mostly;
/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */
static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 };
@@ -67,6 +68,224 @@
}
}
+/*
+ * Allocation routine for new entries of loss interval data
+ */
+static struct tfrc_loss_data_entry *tfrc_ld_add_new(struct tfrc_loss_data *ld)
+{
+ struct tfrc_loss_data_entry *new =
+ kmem_cache_alloc(tfrc_ld_slab, GFP_ATOMIC);
+
+ if (new == NULL)
+ return NULL;
+
+ memset(new, 0, sizeof(struct tfrc_loss_data_entry));
+
+ new->next = ld->head;
+ ld->head = new;
+ ld->counter++;
+
+ return new;
+}
+
+void tfrc_sp_ld_cleanup(struct tfrc_loss_data *ld)
+{
+ struct tfrc_loss_data_entry *next, *h = ld->head;
+
+ if (!h)
+ return;
+
+ while (h) {
+ next = h->next;
+ kmem_cache_free(tfrc_ld_slab, h);
+ h = next;
+ }
+
+ ld->head = NULL;
+ ld->counter = 0;
+}
+
+void tfrc_sp_ld_prepare_data(u8 loss_count, struct tfrc_loss_data *ld)
+{
+ u8 *li_ofs, *d_ofs;
+ struct tfrc_loss_data_entry *e;
+ u16 count;
+
+ li_ofs = &ld->loss_intervals_opts[0];
+ d_ofs = &ld->drop_opts[0];
+
+ count = 0;
+ e = ld->head;
+
+ *li_ofs = loss_count + 1;
+ li_ofs++;
+
+ while (e != NULL) {
+
+ if (count < TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH) {
+ *li_ofs = ((htonl(e->lossless_length) & 0x00FFFFFF)<<8);
+ li_ofs += 3;
+ *li_ofs = ((e->ecn_nonce_sum&0x1) << 31) &
+ (htonl((e->loss_length & 0x00FFFFFF))<<8);
+ li_ofs += 3;
+ *li_ofs = ((htonl(e->data_length) & 0x00FFFFFF)<<8);
+ li_ofs += 3;
+ }
+
+ if (count < TFRC_DROP_OPT_MAX_LENGTH) {
+ *d_ofs = (htonl(e->drop_count) & 0x00FFFFFF)<<8;
+ d_ofs += 3;
+ }
+
+ if ((count >= TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH) &&
+ (count >= TFRC_DROP_OPT_MAX_LENGTH))
+ break;
+
+ count++;
+ e = e->next;
+ }
+}
+
+void tfrc_sp_update_li_data(struct tfrc_loss_data *ld,
+ struct tfrc_rx_hist *rh,
+ struct sk_buff *skb,
+ bool new_loss, bool new_event)
+{
+ struct tfrc_loss_data_entry *new, *h;
+
+ if (!dccp_data_packet(skb))
+ return;
+
+ if (ld->head == NULL) {
+ new = tfrc_ld_add_new(ld);
+ if (unlikely(new == NULL)) {
+ DCCP_CRIT("Cannot allocate new loss data registry.");
+ return;
+ }
+
+ if (new_loss) {
+ new->drop_count = rh->num_losses;
+ new->lossless_length = 1;
+ new->loss_length = rh->num_losses;
+
+ if (dccp_data_packet(skb))
+ new->data_length = 1;
+
+ if (dccp_data_packet(skb) && dccp_skb_is_ecn_ect1(skb))
+ new->ecn_nonce_sum = 1;
+ else
+ new->ecn_nonce_sum = 0;
+ } else {
+ new->drop_count = 0;
+ new->lossless_length = 1;
+ new->loss_length = 0;
+
+ if (dccp_data_packet(skb))
+ new->data_length = 1;
+
+ if (dccp_data_packet(skb) && dccp_skb_is_ecn_ect1(skb))
+ new->ecn_nonce_sum = 1;
+ else
+ new->ecn_nonce_sum = 0;
+ }
+
+ return;
+ }
+
+ if (new_event) {
+ new = tfrc_ld_add_new(ld);
+ if (unlikely(new == NULL)) {
+ DCCP_CRIT("Cannot allocate new loss data registry. \
+ Cleaning up.");
+ tfrc_sp_ld_cleanup(ld);
+ return;
+ }
+
+ new->drop_count = rh->num_losses;
+ new->lossless_length = (ld->last_loss_count - rh->loss_count);
+ new->loss_length = rh->num_losses;
+
+ new->ecn_nonce_sum = 0;
+ new->data_length = 0;
+
+ while (ld->last_loss_count > rh->loss_count) {
+ ld->last_loss_count--;
+
+ if (ld->sto_is_data & (1 << (ld->last_loss_count))) {
+ new->data_length++;
+
+ if (ld->sto_ecn & (1 << (ld->last_loss_count)))
+ new->ecn_nonce_sum =
+ !new->ecn_nonce_sum;
+ }
+ }
+
+ return;
+ }
+
+ h = ld->head;
+
+ if (rh->loss_count > ld->last_loss_count) {
+ ld->last_loss_count = rh->loss_count;
+
+ if (dccp_data_packet(skb))
+ ld->sto_is_data |= (1 << (ld->last_loss_count - 1));
+
+ if (dccp_skb_is_ecn_ect1(skb))
+ ld->sto_ecn |= (1 << (ld->last_loss_count - 1));
+
+ return;
+ }
+
+ if (new_loss) {
+ h->drop_count += rh->num_losses;
+ h->lossless_length = (ld->last_loss_count - rh->loss_count);
+ h->loss_length += h->lossless_length + rh->num_losses;
+
+ h->ecn_nonce_sum = 0;
+ h->data_length = 0;
+
+ while (ld->last_loss_count > rh->loss_count) {
+ ld->last_loss_count--;
+
+ if (ld->sto_is_data&(1 << (ld->last_loss_count))) {
+ h->data_length++;
+
+ if (ld->sto_ecn & (1 << (ld->last_loss_count)))
+ h->ecn_nonce_sum = !h->ecn_nonce_sum;
+ }
+ }
+
+ return;
+ }
+
+ if (ld->last_loss_count > rh->loss_count) {
+ while (ld->last_loss_count > rh->loss_count) {
+ ld->last_loss_count--;
+
+ h->lossless_length++;
+
+ if (ld->sto_is_data & (1 << (ld->last_loss_count))) {
+ h->data_length++;
+
+ if (ld->sto_ecn & (1 << (ld->last_loss_count)))
+ h->ecn_nonce_sum = !h->ecn_nonce_sum;
+ }
+ }
+
+ return;
+ }
+
+ h->lossless_length++;
+
+ if (dccp_data_packet(skb)) {
+ h->data_length++;
+
+ if (dccp_skb_is_ecn_ect1(skb))
+ h->ecn_nonce_sum = !h->ecn_nonce_sum;
+ }
+}
+
static void tfrc_sp_lh_calc_i_mean(struct tfrc_loss_hist *lh, __u8 curr_ccval)
{
u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0;
@@ -244,8 +463,11 @@
tfrc_lh_slab = kmem_cache_create("tfrc_sp_li_hist",
sizeof(struct tfrc_loss_interval), 0,
SLAB_HWCACHE_ALIGN, NULL);
+ tfrc_ld_slab = kmem_cache_create("tfrc_sp_li_data",
+ sizeof(struct tfrc_loss_data_entry), 0,
+ SLAB_HWCACHE_ALIGN, NULL);
- if ((tfrc_lh_slab != NULL))
+ if ((tfrc_lh_slab != NULL) || (tfrc_ld_slab != NULL))
return 0;
if (tfrc_lh_slab != NULL) {
@@ -253,6 +475,11 @@
tfrc_lh_slab = NULL;
}
+ if (tfrc_ld_slab != NULL) {
+ kmem_cache_destroy(tfrc_ld_slab);
+ tfrc_ld_slab = NULL;
+ }
+
return -ENOBUFS;
}
@@ -262,4 +489,9 @@
kmem_cache_destroy(tfrc_lh_slab);
tfrc_lh_slab = NULL;
}
+
+ if (tfrc_ld_slab != NULL) {
+ kmem_cache_destroy(tfrc_ld_slab);
+ tfrc_ld_slab = NULL;
+ }
}
Index: dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-08 10:42:30.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-08 10:42:37.000000000 -0300
@@ -70,13 +70,52 @@
struct tfrc_rx_hist;
#endif
+struct tfrc_loss_data_entry {
+ struct tfrc_loss_data_entry *next;
+ u32 lossless_length:24;
+ u8 ecn_nonce_sum:1;
+ u32 loss_length:24;
+ u32 data_length:24;
+ u32 drop_count:24;
+};
+
+#define TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH 28
+#define TFRC_DROP_OPT_MAX_LENGTH 84
+#define TFRC_LI_OPT_SZ \
+ (2 + TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH*9)
+#define TFRC_DROPPED_OPT_SZ \
+ (1 + TFRC_DROP_OPT_MAX_LENGTH*3)
+
+struct tfrc_loss_data {
+ struct tfrc_loss_data_entry *head;
+ u16 counter;
+ u8 loss_intervals_opts[TFRC_LI_OPT_SZ];
+ u8 drop_opts[TFRC_DROPPED_OPT_SZ];
+ u8 last_loss_count;
+ u8 sto_ecn;
+ u8 sto_is_data;
+};
+
+static inline void tfrc_ld_init(struct tfrc_loss_data *ld)
+{
+ memset(ld, 0, sizeof(struct tfrc_loss_data));
+}
+
+struct tfrc_rx_hist;
+
extern bool tfrc_sp_lh_interval_add(struct tfrc_loss_hist *,
struct tfrc_rx_hist *,
u32 (*first_li)(struct sock *),
struct sock *,
__u8 ccval);
+extern void tfrc_sp_update_li_data(struct tfrc_loss_data *,
+ struct tfrc_rx_hist *,
+ struct sk_buff *,
+ bool new_loss, bool new_event);
extern void tfrc_sp_lh_update_i_mean(struct tfrc_loss_hist *lh,
struct sk_buff *);
extern void tfrc_sp_lh_cleanup(struct tfrc_loss_hist *lh);
+extern void tfrc_sp_ld_cleanup(struct tfrc_loss_data *ld);
+extern void tfrc_sp_ld_prepare_data(u8 loss_count, struct tfrc_loss_data *ld);
#endif /* _DCCP_LI_HIST_SP_ */
Index: dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/packet_history_sp.h 2009-09-08 10:42:30.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.h 2009-09-08 10:42:37.000000000 -0300
@@ -203,6 +203,7 @@
extern bool tfrc_sp_rx_congestion_event(struct tfrc_rx_hist *h,
struct tfrc_loss_hist *lh,
+ struct tfrc_loss_data *ld,
struct sk_buff *skb, const u64 ndp,
u32 (*first_li)(struct sock *sk),
struct sock *sk);
Index: dccp_tree_work5/net/dccp/ccids/lib/tfrc_ccids_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/tfrc_ccids_sp.h 2009-09-08 10:42:30.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/tfrc_ccids_sp.h 2009-09-08 10:42:37.000000000 -0300
@@ -129,6 +129,7 @@
* @tstamp_last_feedback - Time at which last feedback was sent
* @hist - Packet history (loss detection + RTT sampling)
* @li_hist - Loss Interval database
+ * @li_data - Loss Interval data for options
* @p_inverse - Inverse of Loss Event Rate (RFC 4342, sec. 8.5)
*/
struct tfrc_hc_rx_sock {
@@ -138,6 +139,7 @@
ktime_t tstamp_last_feedback;
struct tfrc_rx_hist hist;
struct tfrc_loss_hist li_hist;
+ struct tfrc_loss_data li_data;
#define p_inverse li_hist.i_mean
};
Index: dccp_tree_work5/net/dccp/dccp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/dccp.h 2009-09-08 10:42:30.000000000 -0300
+++ dccp_tree_work5/net/dccp/dccp.h 2009-09-08 10:42:37.000000000 -0300
@@ -403,6 +403,16 @@
return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_CE;
}
+static inline bool dccp_skb_is_ecn_ect0(const struct sk_buff *skb)
+{
+ return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_ECT_0;
+}
+
+static inline bool dccp_skb_is_ecn_ect1(const struct sk_buff *skb)
+{
+ return (DCCP_SKB_CB(skb)->dccpd_ecn & INET_ECN_MASK) == INET_ECN_ECT_0;
+}
+
/* RFC 4340, sec. 7.7 */
static inline int dccp_non_data_packet(const struct sk_buff *skb)
{
^ permalink raw reply
* [PATCH 5/5] Updating documentation accordingly
From: Ivo Calado @ 2009-09-08 18:28 UTC (permalink / raw)
To: dccp; +Cc: netdev
Updating documentation accordingly
Signed-off-by: Ivo Calado, Erivaldo Xavier, Leandro Sales <ivocalado@embedded.ufcg.edu.br>, <desadoc@gmail.com>, <leandroal@gmail.com>
Index: dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-08 10:42:37.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.c 2009-09-08 11:03:15.000000000 -0300
@@ -1,4 +1,6 @@
/*
+ * Copyright (c) 2009 Federal University of Campina Grande,
+ * Embedded Systems and Pervasive Computing Lab
* Copyright (c) 2007 The University of Aberdeen, Scotland, UK
* Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
* Copyright (c) 2005-7 Ian McDonald <ian.mcdonald@jandi.co.nz>
@@ -105,6 +107,13 @@
ld->counter = 0;
}
+/*
+ * tfrc_sp_ld_prepare_data - updates arrays on tfrc_loss_data
+ * so they can be sent as options
+ * @loss_count: current loss count (packets after hole on transmission),
+ * used to determine skip length for loss intervals option
+ * @ld: loss intervals data being updated
+ */
void tfrc_sp_ld_prepare_data(u8 loss_count, struct tfrc_loss_data *ld)
{
u8 *li_ofs, *d_ofs;
@@ -146,6 +155,16 @@
}
}
+/*
+ * tfrc_sp_update_li_data - Update tfrc_loss_data upon
+ * packet receiving or loss detection
+ * @ld: tfrc_loss_data being updated
+ * @rh: loss event record
+ * @skb: received packet
+ * @new_loss: dictates if new loss was detected
+ * upon receiving current packet
+ * @new_event: ...and if the loss starts new loss interval
+ */
void tfrc_sp_update_li_data(struct tfrc_loss_data *ld,
struct tfrc_rx_hist *rh,
struct sk_buff *skb,
@@ -324,7 +343,7 @@
}
/*
- * tfrc_lh_update_i_mean - Update the `open' loss interval I_0
+ * tfrc_sp_lh_update_i_mean - Update the `open' loss interval I_0
* This updates I_mean as the sequence numbers increase. As a consequence, the
* open loss interval I_0 increases, hence p = W_tot/max(I_tot0, I_tot1)
* decreases, and thus there is no need to send renewed feedback.
@@ -372,7 +391,8 @@
return cur->li_is_closed;
}
-/* tfrc_lh_interval_add - Insert new record into the Loss Interval database
+/*
+ * tfrc_sp_lh_interval_add - Insert new record into the Loss Interval database
* @lh: Loss Interval database
* @rh: Receive history containing a fresh loss event
* @calc_first_li: Caller-dependent routine to compute length of first interval
Index: dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-08 10:42:37.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/loss_interval_sp.h 2009-09-08 10:55:15.000000000 -0300
@@ -1,6 +1,8 @@
#ifndef _DCCP_LI_HIST_SP_
#define _DCCP_LI_HIST_SP_
/*
+ * Copyright (c) 2009 Federal University of Campina Grande,
+ * Embedded Systems and Pervasive Computing Lab
* Copyright (c) 2007 The University of Aberdeen, Scotland, UK
* Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand.
* Copyright (c) 2005-7 Ian McDonald <ian.mcdonald@jandi.co.nz>
@@ -70,6 +72,15 @@
struct tfrc_rx_hist;
#endif
+/*
+ * tfrc_loss_data_entry - Holds info about one loss interval
+ * @next: next entry on this linked list
+ * @lossless_length: length of lossless sequence
+ * @ecn_nonce_sum: ecn nonce sum for this interval
+ * @loss_length: length of lossy part
+ * @data_length: data length on lossless part
+ * @drop_count: count of dopped packets
+ */
struct tfrc_loss_data_entry {
struct tfrc_loss_data_entry *next;
u32 lossless_length:24;
@@ -79,13 +90,29 @@
u32 drop_count:24;
};
+/* As defined at section 8.6.1. of RFC 4342 */
#define TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH 28
+/* Specified on section 8.7. of CCID4 draft */
#define TFRC_DROP_OPT_MAX_LENGTH 84
#define TFRC_LI_OPT_SZ \
(2 + TFRC_LOSS_INTERVALS_OPT_MAX_LENGTH*9)
#define TFRC_DROPPED_OPT_SZ \
(1 + TFRC_DROP_OPT_MAX_LENGTH*3)
+/*
+ * tfrc_loss_data - loss interval data
+ * used by loss intervals and dropped packets options
+ * @head: linked list containing loss interval data
+ * @counter: number of entries
+ * @loss_intervals_opts: space necessary for writing temporary option
+ * data for loss intervals option
+ * @drop_opts: same for dropped packets option
+ * @last_loss_count: last loss count (num. of packets
+ * after hole on transmission) observed
+ * @sto_ecn: ecn's observed while waiting for hole
+ * to be filled or accepted as missing
+ * @sto_is_data: flags about if packets saw were data packets
+ */
struct tfrc_loss_data {
struct tfrc_loss_data_entry *head;
u16 counter;
Index: dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/packet_history_sp.c 2009-09-08 10:42:37.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.c 2009-09-08 10:57:07.000000000 -0300
@@ -4,6 +4,14 @@
*
* An implementation of the DCCP protocol
*
+ * Copyright (c) 2009 Ivo Calado, Erivaldo Xavier, Leandro Sales
+ *
+ * This code has been developed by the Federal University of Campina Grande
+ * Embedded Systems and Pervasive Computing Lab.
+ * For further information please see http://embedded.ufcg.edu.br/
+ * <ivocalado@embedded.ufcg.edu.br>,
+ * <desadoc@gmail.com>, <leandroal@gmail.com>
+ *
* This code has been developed by the University of Waikato WAND
* research group. For further information please see http://www.wand.net.nz/
* or e-mail Ian McDonald - ian.mcdonald@jandi.co.nz
@@ -339,7 +347,7 @@
}
/*
- * tfrc_rx_congestion_event - Loss detection and further processing
+ * tfrc_sp_rx_congestion_event - Loss detection and further processing
* @h: The non-empty RX history object
* @lh: Loss Intervals database to update
* @skb: Currently received packet
@@ -495,7 +503,7 @@
}
/*
- * tfrc_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal
+ * tfrc_sp_rx_hist_sample_rtt - Sample RTT from timestamp / CCVal
* Based on ideas presented in RFC 4342, 8.1. This function expects that no loss
* is pending and uses the following history entries (via rtt_sample_prev):
* - h->ring[0] contains the most recent history entry prior to @skb;
Index: dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/packet_history_sp.h 2009-09-08 10:42:37.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/packet_history_sp.h 2009-09-08 10:57:36.000000000 -0300
@@ -1,6 +1,14 @@
/*
* Packet RX/TX history data structures and routines for TFRC-based protocols.
*
+ * Copyright (c) 2009 Ivo Calado, Erivaldo Xavier, Leandro Sales
+ *
+ * This code has been developed by the Federal University of Campina Grande
+ * Embedded Systems and Pervasive Computing Lab.
+ * For further information please see http://embedded.ufcg.edu.br/
+ * <ivocalado@embedded.ufcg.edu.br>,
+ * <desadoc@gmail.com>, <leandroal@gmail.com>
+ *
* Copyright (c) 2007 The University of Aberdeen, Scotland, UK
* Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
*
Index: dccp_tree_work5/net/dccp/ccids/lib/tfrc_ccids_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/tfrc_ccids_sp.c 2009-09-08 10:26:38.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/tfrc_ccids_sp.c 2009-09-08 11:00:12.000000000 -0300
@@ -1,4 +1,6 @@
/*
+ * Copyright (c) 2009 Federal University of Campina Grande,
+ * Embedded Systems and Pervasive Computing Lab
* Copyright (c) 2007 Leandro Melo de Sales <leandroal@gmail.com>
* Copyright (c) 2005 Ian McDonald <ian.mcdonald@jandi.co.nz>
* Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Index: dccp_tree_work5/net/dccp/ccids/lib/tfrc_ccids_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/tfrc_ccids_sp.h 2009-09-08 10:42:37.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/tfrc_ccids_sp.h 2009-09-08 11:00:31.000000000 -0300
@@ -1,4 +1,6 @@
/*
+ * Copyright (c) 2009 Federal University of Campina Grande,
+ * Embedded Systems and Pervasive Computing Lab
* Copyright (c) 2007 Leandro Melo de Sales <leandroal@gmail.com>
* Copyright (c) 2005 Ian McDonald <ian.mcdonald@jandi.co.nz>
* Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Index: dccp_tree_work5/net/dccp/ccids/lib/tfrc_equation_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/tfrc_equation_sp.c 2009-09-08 10:26:38.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/tfrc_equation_sp.c 2009-09-08 11:01:45.000000000 -0300
@@ -1,4 +1,6 @@
/*
+ * Copyright (c) 2009 Federal University of Campina Grande,
+ * Embedded Systems and Pervasive Computing Lab
* Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand.
* Copyright (c) 2005 Ian McDonald <ian.mcdonald@jandi.co.nz>
* Copyright (c) 2005 Arnaldo Carvalho de Melo <acme@conectiva.com.br>
@@ -607,7 +609,7 @@
}
/*
- * tfrc_calc_x - Calculate the send rate as per section 3.1 of RFC3448
+ * tfrc_sp_calc_x - Calculate the send rate as per section 3.1 of RFC3448
*
* @s: packet size in bytes
* @R: RTT scaled by 1000000 (i.e., microseconds)
@@ -667,7 +669,7 @@
}
/*
- * tfrc_calc_x_reverse_lookup - try to find p given f(p)
+ * tfrc_sp_calc_x_reverse_lookup - try to find p given f(p)
*
* @fvalue: function value to match, scaled by 1000000
* Returns closest match for p, also scaled by 1000000
@@ -700,7 +702,7 @@
}
/*
- * tfrc_invert_loss_event_rate - Compute p so that 10^6 corresponds to 100%
+ * tfrc_sp_invert_loss_event_rate - Compute p so that 10^6 corresponds to 100%
* When @loss_event_rate is large, there is a chance that p is truncated to 0.
* To avoid re-entering slow-start in that case, we set p = TFRC_SMALLEST_P > 0.
*/
Index: dccp_tree_work5/net/dccp/ccids/lib/tfrc_sp.c
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/tfrc_sp.c 2009-09-08 10:26:38.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/tfrc_sp.c 2009-09-08 11:02:15.000000000 -0300
@@ -1,6 +1,10 @@
/*
* TFRC library initialisation
*
+ * Copyright (c) 2009 Federal University of Campina Grande,
+ * Embedded Systems and Pervasive Computing Lab
+ * Almost copied from tfrc.c, only renamed symbols
+ *
* Copyright (c) 2007 The University of Aberdeen, Scotland, UK
* Copyright (c) 2007 Arnaldo Carvalho de Melo <acme@redhat.com>
*/
Index: dccp_tree_work5/net/dccp/ccids/lib/tfrc_sp.h
===================================================================
--- dccp_tree_work5.orig/net/dccp/ccids/lib/tfrc_sp.h 2009-09-08 10:26:38.000000000 -0300
+++ dccp_tree_work5/net/dccp/ccids/lib/tfrc_sp.h 2009-09-08 11:02:31.000000000 -0300
@@ -1,6 +1,8 @@
#ifndef _TFRC_SP_H_
#define _TFRC_SP_H_
/*
+ * Copyright (c) 2009 Federal University of Campina Grande,
+ * Embedded Systems and Pervasive Computing Lab
* Copyright (c) 2007 The University of Aberdeen, Scotland, UK
* Copyright (c) 2005-6 The University of Waikato, Hamilton, New Zealand.
* Copyright (c) 2005-6 Ian McDonald <ian.mcdonald@jandi.co.nz>
^ permalink raw reply
* Re: [RFC] defer skb allocation in virtio_net -- mergable buff part
From: Shirley Ma @ 2009-09-08 18:30 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, kvm, linux-kernel
In-Reply-To: <20090825114143.GA13884@redhat.com>
Thanks Michael for you details review comments. I am just back from my
vacation. I am working on what you have raised here.
Shirley
^ permalink raw reply
* appletalk: IPDDP_ENCAP and IPDDP_DECAP variables are confusing
From: Robert P. J. Day @ 2009-09-08 18:37 UTC (permalink / raw)
To: netdev
(i pointed out the first part of this to arnaldo but, after i looked
at it more closely, it's a bit messier than i thought so i'll just
toss it out to the list and let someone here figure out what to do
with it.)
from my latest tree scanning script looking for unused Kconfig
variables, we learn that:
$ grep -r IPDDP_DECAP drivers
drivers/net/appletalk/ipddp.c:static int ipddp_mode = IPDDP_DECAP;
drivers/net/appletalk/ipddp.c: if(ipddp_mode == IPDDP_DECAP)
drivers/net/appletalk/ipddp.c: if(ipddp_mode == IPDDP_DECAP)
drivers/net/appletalk/Kconfig:config IPDDP_DECAP
drivers/net/appletalk/ipddp.h:#define IPDDP_DECAP 2
$
which suggests that the Kconfig variable "IPDDP_DECAP" is utterly
redundant as the corresponding CONFIG_IPDDP_DECAP is not used anywhere
so the obvious solution is to simply remove that Kconfig variable.
until you search for the corresponding IPDDP_ENCAP variable:
$ grep -r IPDDP_ENCAP drivers
drivers/net/appletalk/ipddp.c:#ifdef CONFIG_IPDDP_ENCAP
drivers/net/appletalk/ipddp.c:static int ipddp_mode = IPDDP_ENCAP;
drivers/net/appletalk/ipddp.c: if(ipddp_mode == IPDDP_ENCAP)
drivers/net/appletalk/Kconfig:config IPDDP_ENCAP
drivers/net/appletalk/ipddp.h:#define IPDDP_ENCAP 1
$
the difference is this one *is* tested in ipddp.c, thusly:
#ifdef CONFIG_IPDDP_ENCAP
static int ipddp_mode = IPDDP_ENCAP;
#else
static int ipddp_mode = IPDDP_DECAP;
#endif
that makes it seem that those two settings should be mutually
exclusive, but that's not how they're defined in the Kconfig file:
=====
config IPDDP_ENCAP
bool "IP to Appletalk-IP Encapsulation support"
depends on IPDDP
help
If you say Y here, the AppleTalk-IP code will be able to encapsulate
IP packets inside AppleTalk frames; this is useful if your Linux box is stuck on an AppleTalk network (which hopefully contains a
decapsulator somewhere). Please see
<file:Documentation/networking/ipddp.txt> for more information. If
you said Y to "AppleTalk-IP driver support" above and you say Y
here, then you cannot say Y to "AppleTalk-IP to IP Decapsulation
support", below.
config IPDDP_DECAP
bool "Appletalk-IP to IP Decapsulation support"
depends on IPDDP
help
If you say Y here, the AppleTalk-IP code will be able to decapsulate
AppleTalk-IP frames to IP packets; this is useful if you want your
Linux box to act as an Internet gateway for an AppleTalk network.
Please see <file:Documentation/networking/ipddp.txt> for more
information. If you said Y to "AppleTalk-IP driver support" above
and you say Y here, then you cannot say Y to "IP to AppleTalk-IP
Encapsulation support", above.
=====
i'm confused. would someone like to suggest how that can be cleaned
up?
rday
--
========================================================================
Robert P. J. Day Waterloo, Ontario, CANADA
Linux Consulting, Training and Annoying Kernel Pedantry.
Web page: http://crashcourse.ca
Twitter: http://twitter.com/rpjday
========================================================================
^ permalink raw reply
* Re: [PATCH] slub: fix slab_pad_check()
From: Christoph Lameter @ 2009-09-08 19:57 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Eric Dumazet, Pekka Enberg, Zdenek Kabelac, Patrick McHardy,
Robin Holt, Linux Kernel Mailing List, Jesper Dangaard Brouer,
Linux Netdev List, Netfilter Developers
In-Reply-To: <20090904204335.GG6751@linux.vnet.ibm.com>
On Fri, 4 Sep 2009, Paul E. McKenney wrote:
> We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
> five years, so I think we are plenty fine with what we have. So, as
> you say, "as the need arises".
These were the glory years where SLAB_DESTROY_BY_RCU was only used for
anonymous vmas. Now Eric has picked it up for the net subsystem. You may
see the RCU use proliferate.
The kmem_cache_destroy rcu barriers did not matter until
SLAB_DESTROY_BY_RCU spread.
^ permalink raw reply
* Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
From: Michael S. Tsirkin @ 2009-09-08 20:14 UTC (permalink / raw)
To: Ira W. Snyder
Cc: netdev, virtualization, kvm, linux-kernel, mingo, linux-mm, akpm,
hpa, gregory.haskins, Rusty Russell, s.hetze
In-Reply-To: <20090908172035.GB319@ovro.caltech.edu>
On Tue, Sep 08, 2009 at 10:20:35AM -0700, Ira W. Snyder wrote:
> On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
> > > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> > > > What it is: vhost net is a character device that can be used to reduce
> > > > the number of system calls involved in virtio networking.
> > > > Existing virtio net code is used in the guest without modification.
> > > >
> > > > There's similarity with vringfd, with some differences and reduced scope
> > > > - uses eventfd for signalling
> > > > - structures can be moved around in memory at any time (good for migration)
> > > > - support memory table and not just an offset (needed for kvm)
> > > >
> > > > common virtio related code has been put in a separate file vhost.c and
> > > > can be made into a separate module if/when more backends appear. I used
> > > > Rusty's lguest.c as the source for developing this part : this supplied
> > > > me with witty comments I wouldn't be able to write myself.
> > > >
> > > > What it is not: vhost net is not a bus, and not a generic new system
> > > > call. No assumptions are made on how guest performs hypercalls.
> > > > Userspace hypervisors are supported as well as kvm.
> > > >
> > > > How it works: Basically, we connect virtio frontend (configured by
> > > > userspace) to a backend. The backend could be a network device, or a
> > > > tun-like device. In this version I only support raw socket as a backend,
> > > > which can be bound to e.g. SR IOV, or to macvlan device. Backend is
> > > > also configured by userspace, including vlan/mac etc.
> > > >
> > > > Status:
> > > > This works for me, and I haven't see any crashes.
> > > > I have done some light benchmarking (with v4), compared to userspace, I
> > > > see improved latency (as I save up to 4 system calls per packet) but not
> > > > bandwidth/CPU (as TSO and interrupt mitigation are not supported). For
> > > > ping benchmark (where there's no TSO) troughput is also improved.
> > > >
> > > > Features that I plan to look at in the future:
> > > > - tap support
> > > > - TSO
> > > > - interrupt mitigation
> > > > - zero copy
> > > >
> > >
> > > Hello Michael,
> > >
> > > I've started looking at vhost with the intention of using it over PCI to
> > > connect physical machines together.
> > >
> > > The part that I am struggling with the most is figuring out which parts
> > > of the rings are in the host's memory, and which parts are in the
> > > guest's memory.
> >
> > All rings are in guest's memory, to match existing virtio code.
>
> Ok, this makes sense.
>
> > vhost
> > assumes that the memory space of the hypervisor userspace process covers
> > the whole of guest memory.
>
> Is this necessary? Why?
Because with virtio ring can give us arbitrary guest addresses. If
guest was limited to using a subset of addresses, hypervisor would only
have to map these.
> The assumption seems very wrong when you're
> doing data transport between two physical systems via PCI.
> I know vhost has not been designed for this specific situation, but it
> is good to be looking toward other possible uses.
>
> > And there's a translation table.
> > Ring addresses are userspace addresses, they do not undergo translation.
> >
> > > If I understand everything correctly, the rings are all userspace
> > > addresses, which means that they can be moved around in physical memory,
> > > and get pushed out to swap.
> >
> > Unless they are locked, yes.
> >
> > > AFAIK, this is impossible to handle when
> > > connecting two physical systems, you'd need the rings available in IO
> > > memory (PCI memory), so you can ioreadXX() them instead. To the best of
> > > my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
> > > Also, having them migrate around in memory would be a bad thing.
> > >
> > > Also, I'm having trouble figuring out how the packet contents are
> > > actually copied from one system to the other. Could you point this out
> > > for me?
> >
> > The code in net/packet/af_packet.c does it when vhost calls sendmsg.
> >
>
> Ok. The sendmsg() implementation uses memcpy_fromiovec(). Is it possible
> to make this use a DMA engine instead?
Maybe.
> I know this was suggested in an earlier thread.
Yes, it might even give some performance benefit with e.g. I/O AT.
> > > Is there somewhere I can find the userspace code (kvm, qemu, lguest,
> > > etc.) code needed for interacting with the vhost misc device so I can
> > > get a better idea of how userspace is supposed to work?
> >
> > Look in archives for kvm@vger.kernel.org. the subject is qemu-kvm: vhost net.
> >
> > > (Features
> > > negotiation, etc.)
> > >
> >
> > That's not yet implemented as there are no features yet. I'm working on
> > tap support, which will add a feature bit. Overall, qemu does an ioctl
> > to query supported features, and then acks them with another ioctl. I'm
> > also trying to avoid duplicating functionality available elsewhere. So
> > that to check e.g. TSO support, you'd just look at the underlying
> > hardware device you are binding to.
> >
>
> Ok. Do you have plans to support the VIRTIO_NET_F_MRG_RXBUF feature in
> the future? I found that this made an enormous improvement in throughput
> on my virtio-net <-> virtio-net system. Perhaps it isn't needed with
> vhost-net.
Yes, I'm working on it.
> Thanks for replying,
> Ira
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: ipw2200: firmware DMA loading rework
From: Simon Kitching @ 2009-09-08 20:39 UTC (permalink / raw)
To: Mel Gorman
Cc: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev@vger.kernel.org,
linux-mm@kvack.org, James Ketrenos, Chatre, Reinette,
linux-wireless@vger.kernel.org,
"ipw2100-devel@lists.sourceforge.net" <ipw2100-
In-Reply-To: <20090908110041.GE28127@csn.ul.ie>
On Tue, 2009-09-08 at 12:00 +0100, Mel Gorman wrote:
> On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> > On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> >
> > No, it's a different problem.
> >
> > > I suspect the more pressing concern is why is this kmalloc() resulting in
> > > an order-5 allocation request? What size is the buffer being requested?
> > > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > > that should have required order-1 or order-2 is using a higher order for
> > > some reason.
> >
> > It's allocating 68,000 bytes for the mb_history structure, which is
> > used for debugging purposes. That's why it's optional and we continue
> > if it's not allocated. We should fix it to use vmalloc()
>
> You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
> vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
> kernel/profile.c#profile_init() to allocate a large buffer and do something
> similar?
>
> > and I'm
> > inclined to turn it off by default since it's not worth the overhead,
> > and most ext4 users won't find it useful or interesting.
> >
>
> I can't comment as I don't know what sort of debugging it's useful for.
>
Perhaps this is a suitable use for the new proposed flex_array? From an
initial glance, I can't see why the allocated memory has to be
contiguous..
http://lwn.net/Articles/345273/
Cheers, Simon
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [PATCH] drivers/net ks851 MLL network driver
From: Choi, David @ 2009-09-08 20:43 UTC (permalink / raw)
To: philb; +Cc: netdev
Hello Philip,
From : David J. Choi (david.choi@micrel.com)
Body of explanation : This is the first version of ks8851 16bit MLL
Ethernet network driver from Micrel Inc.
Kernel-version : 2.6.31.rc3
Signed of work:
Developer's Certificate of Origin 1.1
Signed-off-by: David J. Choi <david.choi@micrel.com>
------------------------------------
--- linux-2.6.31-rc3/drivers/net/ks8851_mll.c.orig 2009-09-08
09:04:56.000000000 -0700
+++ linux-2.6.31-rc3/drivers/net/ks8851_mll.c 2009-09-08
09:03:06.000000000 -0700
@@ -0,0 +1,1701 @@
+/**
+ * drivers/net/ks8851_mll.c
+ * Copyright (c) 2009 Micrel Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+/**
+ * Supports:
+ * KS8851 16bit MLL chip from Micrel Inc.
+ */
+
+#define DEBUG
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/cache.h>
+#include <linux/crc32.h>
+#include <linux/mii.h>
+#include <linux/platform_device.h>
+#include <linux/delay.h>
+
+#define DRV_NAME "ks8851_mll"
+
+static u8 KS_DEFAULT_MAC_ADDRESS[] = { 0x00, 0x10, 0xA1, 0x86, 0x95,
0x11 };
+#define MAX_RECV_FRAMES 32
+#define MAX_BUF_SIZE 2048
+#define TX_BUF_SIZE 2000
+#define RX_BUF_SIZE 2000
+
+#define KS_CCR 0x08
+#define CCR_EEPROM (1 << 9)
+#define CCR_SPI (1 << 8)
+#define CCR_8BIT (1 << 7)
+#define CCR_16BIT (1 << 6)
+#define CCR_32BIT (1 << 5)
+#define CCR_SHARED (1 << 4)
+#define CCR_32PIN (1 << 0)
+
+/* MAC address registers */
+#define KS_MARL 0x10
+#define KS_MARM 0x12
+#define KS_MARH 0x14
+
+#define KS_OBCR 0x20
+#define OBCR_ODS_16MA (1 << 6)
+
+#define KS_EEPCR 0x22
+#define EEPCR_EESA (1 << 4)
+#define EEPCR_EESB (1 << 3)
+#define EEPCR_EEDO (1 << 2)
+#define EEPCR_EESCK (1 << 1)
+#define EEPCR_EECS (1 << 0)
+
+#define KS_MBIR 0x24
+#define MBIR_TXMBF (1 << 12)
+#define MBIR_TXMBFA (1 << 11)
+#define MBIR_RXMBF (1 << 4)
+#define MBIR_RXMBFA (1 << 3)
+
+#define KS_GRR 0x26
+#define GRR_QMU (1 << 1)
+#define GRR_GSR (1 << 0)
+
+#define KS_WFCR 0x2A
+#define WFCR_MPRXE (1 << 7)
+#define WFCR_WF3E (1 << 3)
+#define WFCR_WF2E (1 << 2)
+#define WFCR_WF1E (1 << 1)
+#define WFCR_WF0E (1 << 0)
+
+#define KS_WF0CRC0 0x30
+#define KS_WF0CRC1 0x32
+#define KS_WF0BM0 0x34
+#define KS_WF0BM1 0x36
+#define KS_WF0BM2 0x38
+#define KS_WF0BM3 0x3A
+
+#define KS_WF1CRC0 0x40
+#define KS_WF1CRC1 0x42
+#define KS_WF1BM0 0x44
+#define KS_WF1BM1 0x46
+#define KS_WF1BM2 0x48
+#define KS_WF1BM3 0x4A
+
+#define KS_WF2CRC0 0x50
+#define KS_WF2CRC1 0x52
+#define KS_WF2BM0 0x54
+#define KS_WF2BM1 0x56
+#define KS_WF2BM2 0x58
+#define KS_WF2BM3 0x5A
+
+#define KS_WF3CRC0 0x60
+#define KS_WF3CRC1 0x62
+#define KS_WF3BM0 0x64
+#define KS_WF3BM1 0x66
+#define KS_WF3BM2 0x68
+#define KS_WF3BM3 0x6A
+
+#define KS_TXCR 0x70
+#define TXCR_TCGICMP (1 << 8)
+#define TXCR_TCGUDP (1 << 7)
+#define TXCR_TCGTCP (1 << 6)
+#define TXCR_TCGIP (1 << 5)
+#define TXCR_FTXQ (1 << 4)
+#define TXCR_TXFCE (1 << 3)
+#define TXCR_TXPE (1 << 2)
+#define TXCR_TXCRC (1 << 1)
+#define TXCR_TXE (1 << 0)
+
+#define KS_TXSR 0x72
+#define TXSR_TXLC (1 << 13)
+#define TXSR_TXMC (1 << 12)
+#define TXSR_TXFID_MASK (0x3f << 0)
+#define TXSR_TXFID_SHIFT (0)
+#define TXSR_TXFID_GET(_v) (((_v) >> 0) & 0x3f)
+
+
+#define KS_RXCR1 0x74
+#define RXCR1_FRXQ (1 << 15)
+#define RXCR1_RXUDPFCC (1 << 14)
+#define RXCR1_RXTCPFCC (1 << 13)
+#define RXCR1_RXIPFCC (1 << 12)
+#define RXCR1_RXPAFMA (1 << 11)
+#define RXCR1_RXFCE (1 << 10)
+#define RXCR1_RXEFE (1 << 9)
+#define RXCR1_RXMAFMA (1 << 8)
+#define RXCR1_RXBE (1 << 7)
+#define RXCR1_RXME (1 << 6)
+#define RXCR1_RXUE (1 << 5)
+#define RXCR1_RXAE (1 << 4)
+#define RXCR1_RXINVF (1 << 1)
+#define RXCR1_RXE (1 << 0)
+#define RXCR1_FILTER_MASK (RXCR1_RXINVF | RXCR1_RXAE | \
+ RXCR1_RXMAFMA | RXCR1_RXPAFMA)
+
+#define KS_RXCR2 0x76
+#define RXCR2_SRDBL_MASK (0x7 << 5)
+#define RXCR2_SRDBL_SHIFT (5)
+#define RXCR2_SRDBL_4B (0x0 << 5)
+#define RXCR2_SRDBL_8B (0x1 << 5)
+#define RXCR2_SRDBL_16B (0x2 << 5)
+#define RXCR2_SRDBL_32B (0x3 << 5)
+/* #define RXCR2_SRDBL_FRAME (0x4 << 5) */
+#define RXCR2_IUFFP (1 << 4)
+#define RXCR2_RXIUFCEZ (1 << 3)
+#define RXCR2_UDPLFE (1 << 2)
+#define RXCR2_RXICMPFCC (1 << 1)
+#define RXCR2_RXSAF (1 << 0)
+
+#define KS_TXMIR 0x78
+
+#define KS_RXFHSR 0x7C
+#define RXFSHR_RXFV (1 << 15)
+#define RXFSHR_RXICMPFCS (1 << 13)
+#define RXFSHR_RXIPFCS (1 << 12)
+#define RXFSHR_RXTCPFCS (1 << 11)
+#define RXFSHR_RXUDPFCS (1 << 10)
+#define RXFSHR_RXBF (1 << 7)
+#define RXFSHR_RXMF (1 << 6)
+#define RXFSHR_RXUF (1 << 5)
+#define RXFSHR_RXMR (1 << 4)
+#define RXFSHR_RXFT (1 << 3)
+#define RXFSHR_RXFTL (1 << 2)
+#define RXFSHR_RXRF (1 << 1)
+#define RXFSHR_RXCE (1 << 0)
+#define RXFSHR_ERR (RXFSHR_RXCE |
RXFSHR_RXRF |\
+ RXFSHR_RXFTL | RXFSHR_RXMR |\
+ RXFSHR_RXICMPFCS |
RXFSHR_RXIPFCS |\
+ RXFSHR_RXTCPFCS)
+#define KS_RXFHBCR 0x7E
+#define RXFHBCR_CNT_MASK 0x0FFF
+
+#define KS_TXQCR 0x80
+#define TXQCR_AETFE (1 << 2)
+#define TXQCR_TXQMAM (1 << 1)
+#define TXQCR_METFE (1 << 0)
+
+#define KS_RXQCR 0x82
+#define RXQCR_RXDTTS (1 << 12)
+#define RXQCR_RXDBCTS (1 << 11)
+#define RXQCR_RXFCTS (1 << 10)
+#define RXQCR_RXIPHTOE (1 << 9)
+#define RXQCR_RXDTTE (1 << 7)
+#define RXQCR_RXDBCTE (1 << 6)
+#define RXQCR_RXFCTE (1 << 5)
+#define RXQCR_ADRFE (1 << 4)
+#define RXQCR_SDA (1 << 3)
+#define RXQCR_RRXEF (1 << 0)
+#define RXQCR_CMD_CNTL (RXQCR_RXFCTE|RXQCR_ADRFE)
+
+#define KS_TXFDPR 0x84
+#define TXFDPR_TXFPAI (1 << 14)
+#define TXFDPR_TXFP_MASK (0x7ff << 0)
+#define TXFDPR_TXFP_SHIFT (0)
+
+#define KS_RXFDPR 0x86
+#define RXFDPR_RXFPAI (1 << 14)
+
+#define KS_RXDTTR 0x8C
+#define KS_RXDBCTR 0x8E
+
+#define KS_IER 0x90
+#define KS_ISR 0x92
+#define IRQ_LCI (1 << 15)
+#define IRQ_TXI (1 << 14)
+#define IRQ_RXI (1 << 13)
+#define IRQ_RXOI (1 << 11)
+#define IRQ_TXPSI (1 << 9)
+#define IRQ_RXPSI (1 << 8)
+#define IRQ_TXSAI (1 << 6)
+#define IRQ_RXWFDI (1 << 5)
+#define IRQ_RXMPDI (1 << 4)
+#define IRQ_LDI (1 << 3)
+#define IRQ_EDI (1 << 2)
+#define IRQ_SPIBEI (1 << 1)
+#define IRQ_DEDI (1 << 0)
+
+#define KS_RXFCTR 0x9C
+#define RXFCTR_THRESHOLD_MASK 0x00FF
+
+#define KS_RXFC 0x9D
+#define RXFCTR_RXFC_MASK (0xff << 8)
+#define RXFCTR_RXFC_SHIFT (8)
+#define RXFCTR_RXFC_GET(_v) (((_v) >> 8) & 0xff)
+#define RXFCTR_RXFCT_MASK (0xff << 0)
+#define RXFCTR_RXFCT_SHIFT (0)
+
+#define KS_TXNTFSR 0x9E
+
+#define KS_MAHTR0 0xA0
+#define KS_MAHTR1 0xA2
+#define KS_MAHTR2 0xA4
+#define KS_MAHTR3 0xA6
+
+#define KS_FCLWR 0xB0
+#define KS_FCHWR 0xB2
+#define KS_FCOWR 0xB4
+
+#define KS_CIDER 0xC0
+#define CIDER_ID 0x8870
+#define CIDER_REV_MASK (0x7 << 1)
+#define CIDER_REV_SHIFT (1)
+#define CIDER_REV_GET(_v) (((_v) >> 1) & 0x7)
+
+#define KS_CGCR 0xC6
+#define KS_IACR 0xC8
+#define IACR_RDEN (1 << 12)
+#define IACR_TSEL_MASK (0x3 << 10)
+#define IACR_TSEL_SHIFT (10)
+#define IACR_TSEL_MIB (0x3 << 10)
+#define IACR_ADDR_MASK (0x1f << 0)
+#define IACR_ADDR_SHIFT (0)
+
+#define KS_IADLR 0xD0
+#define KS_IAHDR 0xD2
+
+#define KS_PMECR 0xD4
+#define PMECR_PME_DELAY (1 << 14)
+#define PMECR_PME_POL (1 << 12)
+#define PMECR_WOL_WAKEUP (1 << 11)
+#define PMECR_WOL_MAGICPKT (1 << 10)
+#define PMECR_WOL_LINKUP (1 << 9)
+#define PMECR_WOL_ENERGY (1 << 8)
+#define PMECR_AUTO_WAKE_EN (1 << 7)
+#define PMECR_WAKEUP_NORMAL (1 << 6)
+#define PMECR_WKEVT_MASK (0xf << 2)
+#define PMECR_WKEVT_SHIFT (2)
+#define PMECR_WKEVT_GET(_v) (((_v) >> 2) & 0xf)
+#define PMECR_WKEVT_ENERGY (0x1 << 2)
+#define PMECR_WKEVT_LINK (0x2 << 2)
+#define PMECR_WKEVT_MAGICPKT (0x4 << 2)
+#define PMECR_WKEVT_FRAME (0x8 << 2)
+#define PMECR_PM_MASK (0x3 << 0)
+#define PMECR_PM_SHIFT (0)
+#define PMECR_PM_NORMAL (0x0 << 0)
+#define PMECR_PM_ENERGY (0x1 << 0)
+#define PMECR_PM_SOFTDOWN (0x2 << 0)
+#define PMECR_PM_POWERSAVE (0x3 << 0)
+
+/* Standard MII PHY data */
+#define KS_P1MBCR 0xE4
+#define P1MBCR_FORCE_FDX (1 << 8)
+
+#define KS_P1MBSR 0xE6
+#define P1MBSR_AN_COMPLETE (1 << 5)
+#define P1MBSR_AN_CAPABLE (1 << 3)
+#define P1MBSR_LINK_UP (1 << 2)
+
+#define KS_PHY1ILR 0xE8
+#define KS_PHY1IHR 0xEA
+#define KS_P1ANAR 0xEC
+#define KS_P1ANLPR 0xEE
+
+#define KS_P1SCLMD 0xF4
+#define P1SCLMD_LEDOFF (1 << 15)
+#define P1SCLMD_TXIDS (1 << 14)
+#define P1SCLMD_RESTARTAN (1 << 13)
+#define P1SCLMD_DISAUTOMDIX (1 << 10)
+#define P1SCLMD_FORCEMDIX (1 << 9)
+#define P1SCLMD_AUTONEGEN (1 << 7)
+#define P1SCLMD_FORCE100 (1 << 6)
+#define P1SCLMD_FORCEFDX (1 << 5)
+#define P1SCLMD_ADV_FLOW (1 << 4)
+#define P1SCLMD_ADV_100BT_FDX (1 << 3)
+#define P1SCLMD_ADV_100BT_HDX (1 << 2)
+#define P1SCLMD_ADV_10BT_FDX (1 << 1)
+#define P1SCLMD_ADV_10BT_HDX (1 << 0)
+
+#define KS_P1CR 0xF6
+#define P1CR_HP_MDIX (1 << 15)
+#define P1CR_REV_POL (1 << 13)
+#define P1CR_OP_100M (1 << 10)
+#define P1CR_OP_FDX (1 << 9)
+#define P1CR_OP_MDI (1 << 7)
+#define P1CR_AN_DONE (1 << 6)
+#define P1CR_LINK_GOOD (1 << 5)
+#define P1CR_PNTR_FLOW (1 << 4)
+#define P1CR_PNTR_100BT_FDX (1 << 3)
+#define P1CR_PNTR_100BT_HDX (1 << 2)
+#define P1CR_PNTR_10BT_FDX (1 << 1)
+#define P1CR_PNTR_10BT_HDX (1 << 0)
+
+/* TX Frame control */
+
+#define TXFR_TXIC (1 << 15)
+#define TXFR_TXFID_MASK (0x3f << 0)
+#define TXFR_TXFID_SHIFT (0)
+
+#define KS_P1SR 0xF8
+#define P1SR_HP_MDIX (1 << 15)
+#define P1SR_REV_POL (1 << 13)
+#define P1SR_OP_100M (1 << 10)
+#define P1SR_OP_FDX (1 << 9)
+#define P1SR_OP_MDI (1 << 7)
+#define P1SR_AN_DONE (1 << 6)
+#define P1SR_LINK_GOOD (1 << 5)
+#define P1SR_PNTR_FLOW (1 << 4)
+#define P1SR_PNTR_100BT_FDX (1 << 3)
+#define P1SR_PNTR_100BT_HDX (1 << 2)
+#define P1SR_PNTR_10BT_FDX (1 << 1)
+#define P1SR_PNTR_10BT_HDX (1 << 0)
+
+#define ENUM_BUS_NONE 0
+#define ENUM_BUS_8BIT 1
+#define ENUM_BUS_16BIT 2
+#define ENUM_BUS_32BIT 3
+
+#define MAX_MCAST_LST 32
+#define HW_MCAST_SIZE 8
+#define MAC_ADDR_LEN 6
+
+/**
+ * union ks_tx_hdr - tx header data
+ * @txb: The header as bytes
+ * @txw: The header as 16bit, little-endian words
+ *
+ * A dual representation of the tx header data to allow
+ * access to individual bytes, and to allow 16bit accesses
+ * with 16bit alignment.
+ */
+union ks_tx_hdr {
+ u8 txb[4];
+ __le16 txw[2];
+};
+
+/**
+ * struct ks_net - KS8851 driver private data
+ * @net_device : The network device we're bound to
+ * @hw_addr : start address of data register.
+ * @hw_addr_cmd : start address of command register.
+ * @txh : temporaly buffer to save status/length.
+ * @lock : Lock to ensure that the device is not accessed when
busy.
+ * @pdev : Pointer to platform device.
+ * @mii : The MII state information for the mii calls.
+ * @frame_head_info : frame header information for multi-pkt rx.
+ * @statelock : Lock on this structure for tx list.
+ * @msg_enable : The message flags controlling driver output (see
ethtool).
+ * @frame_cnt : number of frames received.
+ * @bus_width : i/o bus width.
+ * @irq : irq number assigned to this device.
+ * @rc_rxqcr : Cached copy of KS_RXQCR.
+ * @rc_txcr : Cached copy of KS_TXCR.
+ * @rc_ier : Cached copy of KS_IER.
+ * @sharedbus : Multipex(addr and data bus) mode indicator.
+ * @cmd_reg_cache : command register cached.
+ * @cmd_reg_cache_int : command register cached. Used in the irq
handler.
+ * @promiscuous : promiscuous mode indicator.
+ * @all_mcast : mutlicast indicator.
+ * @mcast_lst_size : size of multicast list.
+ * @mcast_lst : multicast list.
+ * @mcast_bits : multicast enabed.
+ * @mac_addr : MAC address assigned to this device.
+ * @fid : frame id.
+ * @extra_byte : number of extra byte prepended rx pkt.
+ * @enabled : indicator this device works.
+ *
+ * The @lock ensures that the chip is protected when certain operations
are
+ * in progress. When the read or write packet transfer is in progress,
most
+ * of the chip registers are not accessible until the transfer is
finished and
+ * the DMA has been de-asserted.
+ *
+ * The @statelock is used to protect information in the structure which
may
+ * need to be accessed via several sources, such as the network driver
layer
+ * or one of the work queues.
+ *
+ */
+#define MALLOC(x) kmalloc(x, GFP_KERNEL)
+
+/* Receive multiplex framer header info */
+struct type_frame_head {
+ u16 sts; /* Frame status */
+ u16 len; /* Byte count */
+};
+
+struct ks_net {
+ struct net_device *netdev;
+ void __iomem *hw_addr;
+ void __iomem *hw_addr_cmd;
+ union ks_tx_hdr txh ____cacheline_aligned;
+ struct mutex lock; /* spinlock to be interrupt safe
*/
+ struct platform_device *pdev;
+ struct mii_if_info mii;
+ struct type_frame_head *frame_head_info;
+ spinlock_t statelock;
+ u32 msg_enable;
+ u32 frame_cnt;
+ int bus_width;
+ int irq;
+
+ u16 rc_rxqcr;
+ u16 rc_txcr;
+ u16 rc_ier;
+ u16 sharedbus;
+ u16 cmd_reg_cache;
+ u16 cmd_reg_cache_int;
+ u16 promiscuous;
+ u16 all_mcast;
+ u16 mcast_lst_size;
+ u8 mcast_lst[MAX_MCAST_LST][MAC_ADDR_LEN];
+ u8 mcast_bits[HW_MCAST_SIZE];
+ u8 mac_addr[6];
+ u8 fid;
+ u8 extra_byte;
+ u8 enabled;
+};
+
+static int msg_enable;
+
+#define ks_info(_ks, _msg...) dev_info(&(_ks)->pdev->dev, _msg)
+#define ks_warn(_ks, _msg...) dev_warn(&(_ks)->pdev->dev, _msg)
+#define ks_dbg(_ks, _msg...) dev_dbg(&(_ks)->pdev->dev, _msg)
+#define ks_err(_ks, _msg...) dev_err(&(_ks)->pdev->dev, _msg)
+
+#define BE3 0x8000 /* Byte Enable 3 */
+#define BE2 0x4000 /* Byte Enable 2 */
+#define BE1 0x2000 /* Byte Enable 1 */
+#define BE0 0x1000 /* Byte Enable 0 */
+
+/**
+ * register read/write calls.
+ *
+ * All these calls issue transactions to access the chip's registers.
They
+ * all require that the necessary lock is held to prevent accesses when
the
+ * chip is busy transfering packet data (RX/TX FIFO accesses).
+ */
+
+/**
+ * ks_rdreg8 - read 8 bit register from device
+ * @ks : The chip information
+ * @offset: The register address
+ *
+ * Read a 8bit register from the chip, returning the result
+ */
+static u8 ks_rdreg8(struct ks_net *ks, int offset)
+{
+ u16 data;
+ u8 shift_bit = offset & 0x03;
+ u8 shift_data = (offset & 1) << 3;
+ ks->cmd_reg_cache = (u16) offset | (u16)(BE0 << shift_bit);
+ iowrite16(ks->cmd_reg_cache, ks->hw_addr_cmd);
+ data = ioread16(ks->hw_addr);
+ return (u8)(data >> shift_data);
+}
+
+/**
+ * ks_rdreg16 - read 16 bit register from device
+ * @ks : The chip information
+ * @offset: The register address
+ *
+ * Read a 16bit register from the chip, returning the result
+ */
+
+static u16 ks_rdreg16(struct ks_net *ks, int offset)
+{
+ ks->cmd_reg_cache = (u16)offset | ((BE1 | BE0) << (offset &
0x02));
+ iowrite16(ks->cmd_reg_cache, ks->hw_addr_cmd);
+ return ioread16(ks->hw_addr);
+}
+
+/**
+ * ks_wrreg8 - write 8bit register value to chip
+ * @ks: The chip information
+ * @offset: The register address
+ * @value: The value to write
+ *
+ */
+static void ks_wrreg8(struct ks_net *ks, int offset, u8 value)
+{
+ u8 shift_bit = (offset & 0x03);
+ u16 value_write = (u16)(value << ((offset & 1) << 3));
+ ks->cmd_reg_cache = (u16)offset | (BE0 << shift_bit);
+ iowrite16(ks->cmd_reg_cache, ks->hw_addr_cmd);
+ iowrite16(value_write, ks->hw_addr);
+}
+
+/**
+ * ks_wrreg16 - write 16bit register value to chip
+ * @ks: The chip information
+ * @offset: The register address
+ * @value: The value to write
+ *
+ */
+
+static void ks_wrreg16(struct ks_net *ks, int offset, u16 value)
+{
+ ks->cmd_reg_cache = (u16)offset | ((BE1 | BE0) << (offset &
0x02));
+ iowrite16(ks->cmd_reg_cache, ks->hw_addr_cmd);
+ iowrite16(value, ks->hw_addr);
+}
+
+/**
+ * ks_inblk - read a block of data from QMU. This is called after sudo
DMA mode enabled.
+ * @ks: The chip state
+ * @wptr: buffer address to save data
+ * @len: length in byte to read
+ *
+ */
+static inline void ks_inblk(struct ks_net *ks, u16 *wptr, u32 len)
+{
+ u32 data_port = (u32)ks->hw_addr;
+ len >>= 1;
+ do {
+ *wptr++ = (u16)ioread16(data_port);
+ } while (--len);
+}
+
+/**
+ * ks_outblk - write data to QMU. This is called after sudo DMA mode
enabled.
+ * @ks: The chip information
+ * @wptr: buffer address
+ * @len: length in byte to write
+ *
+ */
+static inline void ks_outblk(struct ks_net *ks, u16 *wptr, u32 len)
+{
+ u32 data_port = (u32)ks->hw_addr;
+ len >>= 1;
+ do {
+ iowrite16(*wptr++, data_port);
+ } while (--len);
+}
+
+/**
+ * ks_tx_fifo_space - return the available hardware buffer size.
+ * @ks: The chip information
+ *
+ */
+static inline u16 ks_tx_fifo_space(struct ks_net *ks)
+{
+ return ks_rdreg16(ks, KS_TXMIR) & 0x1fff;
+}
+
+/**
+ * ks_save_cmd_reg - save the command register from the cache.
+ * @ks: The chip information
+ *
+ */
+static inline void ks_save_cmd_reg(struct ks_net *ks)
+{
+ /*ks8851 MLL has a bug to read back the command register.
+ * So rely on software to save the content of command register.
+ */
+ ks->cmd_reg_cache_int = ks->cmd_reg_cache;
+}
+
+/**
+ * ks_restore_cmd_reg - restore the command register from the cache and
+ * write to hardware register.
+ * @ks: The chip information
+ *
+ */
+static inline void ks_restore_cmd_reg(struct ks_net *ks)
+{
+ ks->cmd_reg_cache = ks->cmd_reg_cache_int;
+ iowrite16(ks->cmd_reg_cache, ks->hw_addr_cmd);
+}
+
+/**
+ * ks_set_powermode - set power mode of the device
+ * @ks: The chip information
+ * @pwrmode: The power mode value to write to KS_PMECR.
+ *
+ * Change the power mode of the chip.
+ */
+static void ks_set_powermode(struct ks_net *ks, unsigned pwrmode)
+{
+ unsigned pmecr;
+
+ if (netif_msg_hw(ks))
+ ks_dbg(ks, "setting power mode %d\n", pwrmode);
+
+ ks_rdreg16(ks, KS_GRR);
+ pmecr = ks_rdreg16(ks, KS_PMECR);
+ pmecr &= ~PMECR_PM_MASK;
+ pmecr |= pwrmode;
+
+ ks_wrreg16(ks, KS_PMECR, pmecr);
+}
+
+/**
+ * ks_read_config - read chip configuration of bus width.
+ * @ks: The chip information
+ *
+ */
+static void ks_read_config(struct ks_net *ks)
+{
+ u16 reg_data = 0;
+
+ /* Regardless of bus width, 8 bit read should always work.*/
+ reg_data = ks_rdreg8(ks, KS_CCR) & 0x00FF;
+ reg_data |= ks_rdreg8(ks, KS_CCR+1) << 8;
+
+ /* addr/data bus are multiplexed */
+ ks->sharedbus = (reg_data & CCR_SHARED) == CCR_SHARED;
+
+ /* There are garbage data when reading data from QMU,
+ depending on bus-width.
+ */
+
+ if (reg_data & CCR_8BIT) {
+ ks->bus_width = ENUM_BUS_8BIT;
+ ks->extra_byte = 1;
+ } else if (reg_data & CCR_16BIT) {
+ ks->bus_width = ENUM_BUS_16BIT;
+ ks->extra_byte = 2;
+ } else {
+ ks->bus_width = ENUM_BUS_32BIT;
+ ks->extra_byte = 4;
+ }
+}
+
+/**
+ * ks_soft_reset - issue one of the soft reset to the device
+ * @ks: The device state.
+ * @op: The bit(s) to set in the GRR
+ *
+ * Issue the relevant soft-reset command to the device's GRR register
+ * specified by @op.
+ *
+ * Note, the delays are in there as a caution to ensure that the reset
+ * has time to take effect and then complete. Since the datasheet does
+ * not currently specify the exact sequence, we have chosen something
+ * that seems to work with our device.
+ */
+static void ks_soft_reset(struct ks_net *ks, unsigned op)
+{
+ /* Disable interrupt first */
+ ks_wrreg16(ks, KS_IER, 0x0000);
+ ks_wrreg16(ks, KS_GRR, op);
+ mdelay(10); /* wait a short time to effect reset */
+ ks_wrreg16(ks, KS_GRR, 0);
+ mdelay(1); /* wait for condition to clear */
+}
+
+
+/**
+ * ks_read_qmu - read 1 pkt data from the QMU.
+ * @ks: The chip information
+ * @buf: buffer address to save 1 pkt
+ * @len: Pkt length
+ * Here is the sequence to read 1 pkt:
+ * 1. set sudo DMA mode
+ * 2. read prepend data
+ * 3. read pkt data
+ * 4. reset sudo DMA Mode
+ */
+static inline void ks_read_qmu(struct ks_net *ks, u16 *buf, u32 len)
+{
+ u32 r = ks->extra_byte & 0x1 ;
+ u32 w = ks->extra_byte - r;
+
+ /* 1. set sudo DMA mode */
+ ks_wrreg16(ks, KS_RXFDPR, RXFDPR_RXFPAI);
+ ks_wrreg8(ks, KS_RXQCR, (ks->rc_rxqcr | RXQCR_SDA) & 0xff);
+
+ /* 2. read prepend data */
+ /**
+ * read 4 + extra bytes and discard them.
+ * extra bytes for dummy, 2 for status, 2 for len
+ */
+
+ /* use likely(r) for 8 bit access for performance */
+ if (unlikely(r))
+ ioread8(ks->hw_addr);
+ ks_inblk(ks, buf, w + 2 + 2);
+
+ /* 3. read pkt data */
+ ks_inblk(ks, buf, ALIGN(len, 4));
+
+ /* 4. reset sudo DMA Mode */
+ ks_wrreg8(ks, KS_RXQCR, ks->rc_rxqcr);
+}
+
+/**
+ * ks_rcv - read multiple pkts data from the QMU.
+ * @ks: The chip information
+ * @netdev: The network device being opened.
+ *
+ * Read all of header information before reading pkt content.
+ * It is not allowed only port of pkts in QMU after issuing
+ * interrupt ack.
+ */
+static void ks_rcv(struct ks_net *ks, struct net_device *netdev)
+{
+ u32 i;
+ struct type_frame_head *frame_hdr = ks->frame_head_info;
+ struct sk_buff *skb;
+
+ ks->frame_cnt = ks_rdreg16(ks, KS_RXFCTR) >> 8;
+
+ /* read all header information */
+ for (i = 0; i < ks->frame_cnt; i++) {
+ /* Checking Received packet status */
+ frame_hdr->sts = ks_rdreg16(ks, KS_RXFHSR);
+ /* Get packet len from hardware */
+ frame_hdr->len = ks_rdreg16(ks, KS_RXFHBCR);
+ frame_hdr++;
+ }
+
+ frame_hdr = ks->frame_head_info;
+ while (ks->frame_cnt--) {
+ skb = dev_alloc_skb(frame_hdr->len + 16);
+ if (likely(skb && (frame_hdr->sts & RXFSHR_RXFV) &&
+ (frame_hdr->len < RX_BUF_SIZE) &&
frame_hdr->len)) {
+ skb_reserve(skb, 2);
+ /* read data block including CRC 4 bytes */
+ ks_read_qmu(ks, (u16 *)skb->data, frame_hdr->len
+ 4);
+ skb_put(skb, frame_hdr->len);
+ skb->dev = netdev;
+ skb->protocol = eth_type_trans(skb, netdev);
+ netif_rx(skb);
+ } else {
+ printk(KERN_ERR "%s: err:skb alloc\n",
__func__);
+ ks_wrreg16(ks, KS_RXQCR, (ks->rc_rxqcr |
RXQCR_RRXEF));
+ if (skb)
+ dev_kfree_skb_irq(skb);
+ }
+ frame_hdr++;
+ }
+}
+
+/**
+ * ks_update_link_status - link status update.
+ * @netdev: The network device being opened.
+ * @ks: The chip information
+ *
+ */
+
+static void ks_update_link_status(struct net_device *netdev, struct
ks_net *ks)
+{
+ /* check the status of the link */
+ u32 link_up_status;
+ if (ks_rdreg16(ks, KS_P1SR) & P1SR_LINK_GOOD) {
+ netif_carrier_on(netdev);
+ link_up_status = true;
+ } else {
+ netif_carrier_off(netdev);
+ link_up_status = false;
+ }
+ if (netif_msg_link(ks))
+ ks_dbg(ks, "%s: %s\n",
+ __func__, link_up_status ? "UP" : "DOWN");
+}
+
+/**
+ * ks_irq - device interrupt handler
+ * @irq: Interrupt number passed from the IRQ hnalder.
+ * @pw: The private word passed to register_irq(), our struct ks_net.
+ *
+ * This is the handler invoked to find out what happened
+ *
+ * Read the interrupt status, work out what needs to be done and then
clear
+ * any of the interrupts that are not needed.
+ */
+
+static irqreturn_t ks_irq(int irq, void *pw)
+{
+ struct ks_net *ks = pw;
+ struct net_device *netdev = ks->netdev;
+ u16 status;
+
+ /*this should be the first in IRQ handler */
+ ks_save_cmd_reg(ks);
+
+ status = ks_rdreg16(ks, KS_ISR);
+ ks_wrreg16(ks, KS_ISR, status);
+
+ if (likely(status & IRQ_RXI))
+ ks_rcv(ks, netdev);
+
+ if (unlikely(status & IRQ_LCI))
+ ks_update_link_status(netdev, ks);
+
+ if (unlikely(status & IRQ_TXI))
+ netif_wake_queue(netdev);
+
+ if (unlikely(status & IRQ_LDI)) {
+
+ u16 pmecr = ks_rdreg16(ks, KS_PMECR);
+ pmecr &= ~PMECR_WKEVT_MASK;
+ ks_wrreg16(ks, KS_PMECR, pmecr | PMECR_WKEVT_LINK);
+ }
+
+ /* this should be the last in IRQ handler*/
+ ks_restore_cmd_reg(ks);
+ return IRQ_HANDLED;
+}
+
+
+/**
+ * ks_net_open - open network device
+ * @netdev: The network device being opened.
+ *
+ * Called when the network device is marked active, such as a user
executing
+ * 'ifconfig up' on the device.
+ */
+static int ks_net_open(struct net_device *netdev)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ int err;
+
+#define KS_INT_FLAGS (IRQF_DISABLED|IRQF_TRIGGER_LOW)
+ /* lock the card, even if we may not actually do anything
+ * else at the moment.
+ */
+ mutex_lock(&ks->lock);
+
+ if (netif_msg_ifup(ks))
+ ks_dbg(ks, "%s - entry\n", __func__);
+
+ /* reset the HW */
+ err = request_irq(ks->irq, ks_irq, KS_INT_FLAGS, DRV_NAME, ks);
+
+ if (err) {
+ printk(KERN_ERR "Failed to request IRQ: %d: %d\n",
+ ks->irq, err);
+ return err;
+ }
+
+ if (netif_msg_ifup(ks))
+ ks_dbg(ks, "network device %s up\n", netdev->name);
+
+ mutex_unlock(&ks->lock);
+
+ return 0;
+}
+
+/**
+ * ks_net_stop - close network device
+ * @netdev: The device being closed.
+ *
+ * Called to close down a network device which has been active. Cancell
any
+ * work, shutdown the RX and TX process and then place the chip into a
low
+ * power state whilst it is not being used.
+ */
+static int ks_net_stop(struct net_device *netdev)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+
+ if (netif_msg_ifdown(ks))
+ ks_info(ks, "%s: shutting down\n", netdev->name);
+
+ netif_stop_queue(netdev);
+
+ kfree(ks->frame_head_info);
+
+ mutex_lock(&ks->lock);
+
+ /* turn off the IRQs and ack any outstanding */
+ ks_wrreg16(ks, KS_IER, 0x0000);
+ ks_wrreg16(ks, KS_ISR, 0xffff);
+
+ /* shutdown RX process */
+ ks_wrreg16(ks, KS_RXCR1, 0x0000);
+
+ /* shutdown TX process */
+ ks_wrreg16(ks, KS_TXCR, 0x0000);
+
+ /* set powermode to soft power down to save power */
+ ks_set_powermode(ks, PMECR_PM_SOFTDOWN);
+ free_irq(ks->irq, netdev);
+ mutex_unlock(&ks->lock);
+ return 0;
+}
+
+
+/**
+ * ks_write_qmu - write 1 pkt data to the QMU.
+ * @ks: The chip information
+ * @pdata: buffer address to save 1 pkt
+ * @len: Pkt length in byte
+ * Here is the sequence to write 1 pkt:
+ * 1. set sudo DMA mode
+ * 2. write status/length
+ * 3. write pkt data
+ * 4. reset sudo DMA Mode
+ * 5. reset sudo DMA mode
+ * 6. Wait until pkt is out
+ */
+static void ks_write_qmu(struct ks_net *ks, u8 *pdata, u16 len)
+{
+ unsigned fid = ks->fid;
+
+ fid = ks->fid;
+ ks->fid = (ks->fid + 1) & TXFR_TXFID_MASK;
+
+ /* reduce the tx interrupt occurrances. */
+ if (!fid)
+ fid |= TXFR_TXIC; /* irq on completion */
+
+ /* start header at txb[0] to align txw entries */
+ ks->txh.txw[0] = cpu_to_le16(fid);
+ ks->txh.txw[1] = cpu_to_le16(len);
+
+ /* 1. set sudo-DMA mode */
+ ks_wrreg8(ks, KS_RXQCR, (ks->rc_rxqcr | RXQCR_SDA) & 0xff);
+ /* 2. write status/lenth info */
+ ks_outblk(ks, ks->txh.txw, 4);
+ /* 3. write pkt data */
+ ks_outblk(ks, (u16 *)pdata, ALIGN(len, 4));
+ /* 4. reset sudo-DMA mode */
+ ks_wrreg8(ks, KS_RXQCR, ks->rc_rxqcr);
+ /* 5. Enqueue Tx(move the pkt from TX buffer into TXQ) */
+ ks_wrreg16(ks, KS_TXQCR, TXQCR_METFE);
+ /* 6. wait until TXQCR_METFE is auto-cleared */
+ while (ks_rdreg16(ks, KS_TXQCR) & TXQCR_METFE)
+ ;
+}
+
+static void ks_disable_int(struct ks_net *ks)
+{
+ ks_wrreg16(ks, KS_IER, 0x0000);
+} /* ks_disable_int */
+
+static void ks_enable_int(struct ks_net *ks)
+{
+ ks_wrreg16(ks, KS_IER, ks->rc_ier);
+} /* ks_enable_int */
+
+/**
+ * ks_start_xmit - transmit packet
+ * @skb : The buffer to transmit
+ * @netdev : The device used to transmit the packet.
+ *
+ * Called by the network layer to transmit the @skb.
+ * spin_lock_irqsave is required because tx and rx should be mutual
exclusive.
+ * So while tx is in-progress, prevent IRQ interrupt from happenning.
+ */
+static int ks_start_xmit(struct sk_buff *skb, struct net_device
*netdev)
+{
+ int retv = NETDEV_TX_OK;
+ struct ks_net *ks = netdev_priv(netdev);
+
+ disable_irq(netdev->irq);
+ ks_disable_int(ks);
+ spin_lock(&ks->statelock);
+
+ /* Extra space are required:
+ * 4 byte for alignment, 4 for status/length, 4 for CRC
+ */
+
+ if (likely(ks_tx_fifo_space(ks) >= skb->len + 12)) {
+ ks_write_qmu(ks, skb->data, skb->len);
+ dev_kfree_skb(skb);
+ } else
+ retv = NETDEV_TX_BUSY;
+ spin_unlock(&ks->statelock);
+ ks_enable_int(ks);
+ enable_irq(netdev->irq);
+ return retv;
+}
+
+/**
+ * ks_start_rx - ready to serve pkts
+ * @ks : The chip information
+ *
+ */
+static void ks_start_rx(struct ks_net *ks)
+{
+ u16 cntl;
+
+ /* Enables QMU Receive (RXCR1). */
+ cntl = ks_rdreg16(ks, KS_RXCR1);
+ cntl |= RXCR1_RXE ;
+ ks_wrreg16(ks, KS_RXCR1, cntl);
+} /* ks_start_rx */
+
+/**
+ * ks_stop_rx - stop to serve pkts
+ * @ks : The chip information
+ *
+ */
+static void ks_stop_rx(struct ks_net *ks)
+{
+ u16 cntl;
+
+ /* Disables QMU Receive (RXCR1). */
+ cntl = ks_rdreg16(ks, KS_RXCR1);
+ cntl &= ~RXCR1_RXE ;
+ ks_wrreg16(ks, KS_RXCR1, cntl);
+
+} /* ks_stop_rx */
+
+static unsigned long const ethernet_polynomial = 0x04c11db7U;
+
+static unsigned long ether_gen_crc(int length, u8 *data)
+{
+ long crc = -1;
+ while (--length >= 0) {
+ u8 current_octet = *data++;
+ int bit;
+
+ for (bit = 0; bit < 8; bit++, current_octet >>= 1) {
+ crc = (crc << 1) ^
+ ((crc < 0) ^ (current_octet & 1) ?
+ ethernet_polynomial : 0);
+ }
+ }
+ return (unsigned long)crc;
+} /* ether_gen_crc */
+
+/**
+* ks_set_grpaddr - set multicast information
+* @ks : The chip information
+*/
+
+static void ks_set_grpaddr(struct ks_net *ks)
+{
+ u8 i;
+ u32 index, position, value;
+
+ memset(ks->mcast_bits, 0, sizeof(u8) * HW_MCAST_SIZE);
+
+ for (i = 0; i < ks->mcast_lst_size; i++) {
+ position = (ether_gen_crc(6, ks->mcast_lst[i]) >> 26) &
0x3f;
+ index = position >> 3;
+ value = 1 << (position & 7);
+ ks->mcast_bits[index] |= (u8)value;
+ }
+
+ for (i = 0; i < HW_MCAST_SIZE; i++) {
+ if (i & 1) {
+ ks_wrreg16(ks, (u16)((KS_MAHTR0 + i) & ~1),
+ (ks->mcast_bits[i] << 8) |
+ ks->mcast_bits[i - 1]);
+ }
+ }
+} /* ks_set_grpaddr */
+
+/*
+* ks_clear_mcast - clear multicast information
+*
+* @ks : The chip information
+* This routine removes all mcast addresses set in the hardware.
+*/
+
+static void ks_clear_mcast(struct ks_net *ks)
+{
+ u16 i, mcast_size;
+ for (i = 0; i < HW_MCAST_SIZE; i++)
+ ks->mcast_bits[i] = 0;
+
+ mcast_size = HW_MCAST_SIZE >> 2;
+ for (i = 0; i < mcast_size; i++)
+ ks_wrreg16(ks, KS_MAHTR0 + (2*i), 0);
+}
+
+static void ks_set_promis(struct ks_net *ks, u16 promiscuous_mode)
+{
+ u16 cntl;
+ ks->promiscuous = promiscuous_mode;
+ ks_stop_rx(ks); /* Stop receiving for reconfiguration */
+ cntl = ks_rdreg16(ks, KS_RXCR1);
+
+ cntl &= ~RXCR1_FILTER_MASK;
+ if (promiscuous_mode)
+ /* Enable Promiscuous mode */
+ cntl |= RXCR1_RXAE | RXCR1_RXINVF;
+ else
+ /* Disable Promiscuous mode (default normal mode) */
+ cntl |= RXCR1_RXPAFMA;
+
+ ks_wrreg16(ks, KS_RXCR1, cntl);
+
+ if (ks->enabled)
+ ks_start_rx(ks);
+
+} /* ks_set_promis */
+
+static void ks_set_mcast(struct ks_net *ks, u16 mcast)
+{
+ u16 cntl;
+
+ ks->all_mcast = mcast;
+ ks_stop_rx(ks); /* Stop receiving for reconfiguration */
+ cntl = ks_rdreg16(ks, KS_RXCR1);
+ cntl &= ~RXCR1_FILTER_MASK;
+ if (mcast)
+ /* Enable "Perfect with Multicast address passed mode"
*/
+ cntl |= (RXCR1_RXAE | RXCR1_RXMAFMA | RXCR1_RXPAFMA);
+ else
+ /**
+ * Disable "Perfect with Multicast address passed
+ * mode" (normal mode).
+ */
+ cntl |= RXCR1_RXPAFMA;
+
+ ks_wrreg16(ks, KS_RXCR1, cntl);
+
+ if (ks->enabled)
+ ks_start_rx(ks);
+} /* ks_set_mcast */
+
+static void ks_set_rx_mode(struct net_device *netdev)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ struct dev_mc_list *ptr;
+
+ /* Turn on/off promiscuous mode. */
+ if ((netdev->flags & IFF_PROMISC) == IFF_PROMISC)
+ ks_set_promis(ks,
+ (u16)((netdev->flags & IFF_PROMISC) ==
IFF_PROMISC));
+ /* Turn on/off all mcast mode. */
+ else if ((netdev->flags & IFF_ALLMULTI) == IFF_ALLMULTI)
+ ks_set_mcast(ks,
+ (u16)((netdev->flags & IFF_ALLMULTI) ==
IFF_ALLMULTI));
+ else
+ ks_set_promis(ks, false);
+
+ if ((netdev->flags & IFF_MULTICAST) && netdev->mc_count) {
+ if (netdev->mc_count <= MAX_MCAST_LST) {
+ int i = 0;
+ for (ptr = netdev->mc_list; ptr; ptr =
ptr->next) {
+ if (!(*ptr->dmi_addr & 1))
+ continue;
+ if (i >= MAX_MCAST_LST)
+ break;
+ memcpy(ks->mcast_lst[i++],
ptr->dmi_addr,
+ MAC_ADDR_LEN);
+ }
+ ks->mcast_lst_size = (u8)i;
+ ks_set_grpaddr(ks);
+ } else {
+ /**
+ * List too big to support so
+ * turn on all mcast mode.
+ */
+ ks->mcast_lst_size = MAX_MCAST_LST;
+ ks_set_mcast(ks, true);
+ }
+ } else {
+ ks->mcast_lst_size = 0;
+ ks_clear_mcast(ks);
+ }
+} /* ks_set_rx_mode */
+
+static void ks_set_mac(struct ks_net *ks, u8 *data)
+{
+ u16 *pw = (u16 *)data;
+ u16 w, u;
+
+ ks_stop_rx(ks); /* Stop receiving for reconfiguration */
+
+ u = *pw++;
+ w = ((u & 0xFF) << 8) | ((u >> 8) & 0xFF);
+ ks_wrreg16(ks, KS_MARH, w);
+
+ u = *pw++;
+ w = ((u & 0xFF) << 8) | ((u >> 8) & 0xFF);
+ ks_wrreg16(ks, KS_MARM, w);
+
+ u = *pw;
+ w = ((u & 0xFF) << 8) | ((u >> 8) & 0xFF);
+ ks_wrreg16(ks, KS_MARL, w);
+
+ memcpy(ks->mac_addr, data, 6);
+
+ if (ks->enabled)
+ ks_start_rx(ks);
+}
+
+static int ks_set_mac_address(struct net_device *netdev, void *paddr)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ struct sockaddr *addr = paddr;
+ u8 *da;
+
+ memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len);
+
+ da = (u8 *)netdev->dev_addr;
+
+ ks_set_mac(ks, da);
+ return 0;
+}
+
+static int ks_net_ioctl(struct net_device *netdev, struct ifreq *req,
int cmd)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+
+ if (!netif_running(netdev))
+ return -EINVAL;
+
+ return generic_mii_ioctl(&ks->mii, if_mii(req), cmd, NULL);
+}
+
+static const struct net_device_ops ks_netdev_ops = {
+ .ndo_open = ks_net_open,
+ .ndo_stop = ks_net_stop,
+ .ndo_do_ioctl = ks_net_ioctl,
+ .ndo_start_xmit = ks_start_xmit,
+ .ndo_set_mac_address = ks_set_mac_address,
+ .ndo_set_rx_mode = ks_set_rx_mode,
+ .ndo_change_mtu = eth_change_mtu,
+ .ndo_validate_addr = eth_validate_addr,
+};
+
+/* ethtool support */
+
+static void ks_get_drvinfo(struct net_device *netdev,
+ struct ethtool_drvinfo *di)
+{
+ strlcpy(di->driver, DRV_NAME, sizeof(di->driver));
+ strlcpy(di->version, "1.00", sizeof(di->version));
+ strlcpy(di->bus_info, dev_name(netdev->dev.parent),
+ sizeof(di->bus_info));
+}
+
+static u32 ks_get_msglevel(struct net_device *netdev)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ return ks->msg_enable;
+}
+
+static void ks_set_msglevel(struct net_device *netdev, u32 to)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ ks->msg_enable = to;
+}
+
+static int ks_get_settings(struct net_device *netdev, struct
ethtool_cmd *cmd)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ return mii_ethtool_gset(&ks->mii, cmd);
+}
+
+static int ks_set_settings(struct net_device *netdev, struct
ethtool_cmd *cmd)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ return mii_ethtool_sset(&ks->mii, cmd);
+}
+
+static u32 ks_get_link(struct net_device *netdev)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ return mii_link_ok(&ks->mii);
+}
+
+static int ks_nway_reset(struct net_device *netdev)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ return mii_nway_restart(&ks->mii);
+}
+
+static const struct ethtool_ops ks_ethtool_ops = {
+ .get_drvinfo = ks_get_drvinfo,
+ .get_msglevel = ks_get_msglevel,
+ .set_msglevel = ks_set_msglevel,
+ .get_settings = ks_get_settings,
+ .set_settings = ks_set_settings,
+ .get_link = ks_get_link,
+ .nway_reset = ks_nway_reset,
+};
+
+/* MII interface controls */
+
+/**
+ * ks_phy_reg - convert MII register into a KS8851 register
+ * @reg: MII register number.
+ *
+ * Return the KS8851 register number for the corresponding MII PHY
register
+ * if possible. Return zero if the MII register has no direct mapping
to the
+ * KS8851 register set.
+ */
+static int ks_phy_reg(int reg)
+{
+ switch (reg) {
+ case MII_BMCR:
+ return KS_P1MBCR;
+ case MII_BMSR:
+ return KS_P1MBSR;
+ case MII_PHYSID1:
+ return KS_PHY1ILR;
+ case MII_PHYSID2:
+ return KS_PHY1IHR;
+ case MII_ADVERTISE:
+ return KS_P1ANAR;
+ case MII_LPA:
+ return KS_P1ANLPR;
+ }
+
+ return 0x0;
+}
+
+/**
+ * ks_phy_read - MII interface PHY register read.
+ * @netdev: The network device the PHY is on.
+ * @phy_addr: Address of PHY (ignored as we only have one)
+ * @reg: The register to read.
+ *
+ * This call reads data from the PHY register specified in @reg. Since
the
+ * device does not support all the MII registers, the non-existant
values
+ * are always returned as zero.
+ *
+ * We return zero for unsupported registers as the MII code does not
check
+ * the value returned for any error status, and simply returns it to
the
+ * caller. The mii-tool that the driver was tested with takes any -ve
error
+ * as real PHY capabilities, thus displaying incorrect data to the
user.
+ */
+static int ks_phy_read(struct net_device *netdev, int phy_addr, int
reg)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ int ksreg;
+ int result;
+
+ ksreg = ks_phy_reg(reg);
+ if (!ksreg)
+ return 0x0; /* no error return allowed, so use zero
*/
+
+ mutex_lock(&ks->lock);
+ result = ks_rdreg16(ks, ksreg);
+ mutex_unlock(&ks->lock);
+
+ return result;
+}
+
+static void ks_phy_write(struct net_device *netdev,
+ int phy, int reg, int value)
+{
+ struct ks_net *ks = netdev_priv(netdev);
+ int ksreg;
+
+ ksreg = ks_phy_reg(reg);
+ if (ksreg) {
+ mutex_lock(&ks->lock);
+ ks_wrreg16(ks, ksreg, value);
+ mutex_unlock(&ks->lock);
+ }
+}
+
+/**
+ * ks_read_selftest - read the selftest memory info.
+ * @ks: The device state
+ *
+ * Read and check the TX/RX memory selftest information.
+ */
+static int ks_read_selftest(struct ks_net *ks)
+{
+ unsigned both_done = MBIR_TXMBF | MBIR_RXMBF;
+ int ret = 0;
+ unsigned rd;
+
+ rd = ks_rdreg16(ks, KS_MBIR);
+
+ if ((rd & both_done) != both_done) {
+ ks_warn(ks, "Memory selftest not finished\n");
+ return 0;
+ }
+
+ if (rd & MBIR_TXMBFA) {
+ ks_err(ks, "TX memory selftest fails\n");
+ ret |= 1;
+ }
+
+ if (rd & MBIR_RXMBFA) {
+ ks_err(ks, "RX memory selftest fails\n");
+ ret |= 2;
+ }
+
+ ks_info(ks, "the selftest passes\n");
+ return ret;
+}
+
+static void ks_disable(struct ks_net *ks)
+{
+ u16 w;
+
+ w = ks_rdreg16(ks, KS_TXCR);
+
+ /* Disables QMU Transmit (TXCR). */
+ w &= ~TXCR_TXE;
+ ks_wrreg16(ks, KS_TXCR, w);
+
+ /* Disables QMU Receive (RXCR1). */
+ w = ks_rdreg16(ks, KS_RXCR1);
+ w &= ~RXCR1_RXE ;
+ ks_wrreg16(ks, KS_RXCR1, w);
+
+ ks->enabled = false;
+
+} /* ks_disable */
+
+static void ks_setup(struct ks_net *ks)
+{
+ u16 w;
+
+ /**
+ * Configure QMU Transmit
+ */
+
+ /* Setup Transmit Frame Data Pointer Auto-Increment (TXFDPR) */
+ ks_wrreg16(ks, KS_TXFDPR, TXFDPR_TXFPAI);
+
+ /* Setup Receive Frame Data Pointer Auto-Increment */
+ ks_wrreg16(ks, KS_RXFDPR, RXFDPR_RXFPAI);
+
+ /* Setup Receive Frame Threshold - 1 frame (RXFCTFC) */
+ ks_wrreg16(ks, KS_RXFCTR, 1 & RXFCTR_THRESHOLD_MASK);
+
+ /* Setup RxQ Command Control (RXQCR) */
+ ks->rc_rxqcr = RXQCR_CMD_CNTL;
+ ks_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr);
+
+ /**
+ * set the force mode to half duplex, default is full duplex
+ * because if the auto-negotiation fails, most switch uses
+ * half-duplex.
+ */
+
+ w = ks_rdreg16(ks, KS_P1MBCR);
+ w &= ~P1MBCR_FORCE_FDX;
+ ks_wrreg16(ks, KS_P1MBCR, w);
+
+ w = TXCR_TXFCE | TXCR_TXPE | TXCR_TXCRC | TXCR_TCGIP;
+ ks_wrreg16(ks, KS_TXCR, w);
+
+ w = RXCR1_RXFCE | RXCR1_RXBE | RXCR1_RXUE;
+
+ if (ks->promiscuous) /* bPromiscuous */
+ w |= (RXCR1_RXAE | RXCR1_RXINVF);
+ else if (ks->all_mcast) /* Multicast address passed mode */
+ w |= (RXCR1_RXAE | RXCR1_RXMAFMA | RXCR1_RXPAFMA);
+ else /* Normal mode */
+ w |= RXCR1_RXPAFMA;
+
+ ks_wrreg16(ks, KS_RXCR1, w);
+} /*ks_setup */
+
+
+static void ks_setup_int(struct ks_net *ks)
+{
+ ks->rc_ier = 0x00;
+ /* Clear the interrupts status of the hardware. */
+ ks_wrreg16(ks, KS_ISR, 0xffff);
+
+ /* Enables the interrupts of the hardware. */
+ ks->rc_ier = (IRQ_LCI | IRQ_TXI | IRQ_RXI);
+} /* ks_setup_int */
+
+void ks_enable(struct ks_net *ks)
+{
+ u16 w;
+
+ w = ks_rdreg16(ks, KS_TXCR);
+ /* Enables QMU Transmit (TXCR). */
+ ks_wrreg16(ks, KS_TXCR, w | TXCR_TXE);
+
+ /*
+ * RX Frame Count Threshold Enable and Auto-Dequeue RXQ Frame
+ * Enable
+ */
+
+ w = ks_rdreg16(ks, KS_RXQCR);
+ ks_wrreg16(ks, KS_RXQCR, w | RXQCR_RXFCTE);
+
+ /* Enables QMU Receive (RXCR1). */
+ w = ks_rdreg16(ks, KS_RXCR1);
+ ks_wrreg16(ks, KS_RXCR1, w | RXCR1_RXE);
+ ks->enabled = true;
+} /* ks_enable */
+
+static int ks_hw_init(struct ks_net *ks)
+{
+ ks->promiscuous = 0;
+ ks->all_mcast = 0;
+ ks->mcast_lst_size = 0;
+
+ ks->frame_head_info = (struct type_frame_head *) \
+ MALLOC(sizeof(struct type_frame_head) *
MAX_RECV_FRAMES);
+ if (!ks->frame_head_info) {
+ printk(KERN_ERR "Error: Fail to allocate frame
memory\n");
+ return false;
+ }
+
+ ks_set_mac(ks, KS_DEFAULT_MAC_ADDRESS);
+ return true;
+}
+
+
+static int __devinit ks8851_probe(struct platform_device *pdev)
+{
+ int err = -ENOMEM;
+ struct resource *io_d, *io_c;
+ struct net_device *netdev;
+ struct ks_net *ks;
+ u16 id, data;
+
+ io_d = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ io_c = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+
+ if (!request_mem_region(io_d->start, resource_size(io_d),
DRV_NAME))
+ goto err_mem_region;
+
+ if (!request_mem_region(io_c->start, resource_size(io_c),
DRV_NAME))
+ goto err_mem_region1;
+
+ netdev = alloc_etherdev(sizeof(struct ks_net));
+ if (!netdev)
+ goto err_alloc_etherdev;
+
+ SET_NETDEV_DEV(netdev, &pdev->dev);
+
+ ks = netdev_priv(netdev);
+ ks->netdev = netdev;
+ ks->hw_addr = ioremap(io_d->start, resource_size(io_d));
+
+ if (!ks->hw_addr)
+ goto err_ioremap;
+
+ ks->hw_addr_cmd = ioremap(io_c->start, resource_size(io_c));
+ if (!ks->hw_addr_cmd)
+ goto err_ioremap1;
+
+ ks->irq = platform_get_irq(pdev, 0);
+
+ if (ks->irq < 0) {
+ err = ks->irq;
+ goto err_get_irq;
+ }
+
+ ks->pdev = pdev;
+
+ mutex_init(&ks->lock);
+ spin_lock_init(&ks->statelock);
+
+ netdev->netdev_ops = &ks_netdev_ops;
+ netdev->ethtool_ops = &ks_ethtool_ops;
+
+ /* setup mii state */
+ ks->mii.dev = netdev;
+ ks->mii.phy_id = 1,
+ ks->mii.phy_id_mask = 1;
+ ks->mii.reg_num_mask = 0xf;
+ ks->mii.mdio_read = ks_phy_read;
+ ks->mii.mdio_write = ks_phy_write;
+
+ ks_info(ks, "message enable is %d\n", msg_enable);
+ /* set the default message enable */
+ ks->msg_enable = netif_msg_init(msg_enable, (NETIF_MSG_DRV |
+ NETIF_MSG_PROBE |
+ NETIF_MSG_LINK));
+ ks_read_config(ks);
+
+ /* simple check for a valid chip being connected to the bus */
+ if ((ks_rdreg16(ks, KS_CIDER) & ~CIDER_REV_MASK) != CIDER_ID) {
+ ks_err(ks, "failed to read device ID\n");
+ err = -ENODEV;
+ goto err_register;
+ }
+
+ if (ks_read_selftest(ks)) {
+ ks_err(ks, "failed to read device ID\n");
+ err = -ENODEV;
+ goto err_register;
+ }
+
+ err = register_netdev(netdev);
+ if (err)
+ goto err_register;
+
+ platform_set_drvdata(pdev, netdev);
+
+ ks_soft_reset(ks, GRR_GSR);
+ ks_hw_init(ks);
+ ks_disable(ks);
+ ks_setup(ks);
+ ks_setup_int(ks);
+ ks_enable_int(ks);
+ ks_enable(ks);
+ memcpy(netdev->dev_addr, ks->mac_addr, 6);
+
+ data = ks_rdreg16(ks, KS_OBCR);
+ ks_wrreg16(ks, KS_OBCR, data | OBCR_ODS_16MA);
+
+ /**
+ * If you want to use the default MAC addr,
+ * comment out the 2 functions below.
+ */
+
+ random_ether_addr(netdev->dev_addr);
+ ks_set_mac(ks, netdev->dev_addr);
+
+ id = ks_rdreg16(ks, KS_CIDER);
+
+ printk(KERN_INFO DRV_NAME
+ " Found chip, family: 0x%x, id: 0x%x, rev: 0x%x\n",
+ (id >> 8) & 0xff, (id >> 4) & 0xf, (id >> 1) & 0x7);
+ return 0;
+
+err_register:
+err_get_irq:
+ iounmap(ks->hw_addr_cmd);
+err_ioremap1:
+ iounmap(ks->hw_addr);
+err_ioremap:
+ free_netdev(netdev);
+err_alloc_etherdev:
+ release_mem_region(io_c->start, resource_size(io_c));
+err_mem_region1:
+ release_mem_region(io_d->start, resource_size(io_d));
+err_mem_region:
+ return err;
+}
+
+static int __devexit ks8851_remove(struct platform_device *pdev)
+{
+ struct net_device *netdev = platform_get_drvdata(pdev);
+ struct ks_net *ks = netdev_priv(netdev);
+ struct resource *iomem = platform_get_resource(pdev,
IORESOURCE_MEM, 0);
+
+ unregister_netdev(netdev);
+ iounmap(ks->hw_addr);
+ free_netdev(netdev);
+ release_mem_region(iomem->start, resource_size(iomem));
+ platform_set_drvdata(pdev, NULL);
+ return 0;
+
+}
+
+static struct platform_driver ks8851_platform_driver = {
+ .driver = {
+ .name = DRV_NAME,
+ .owner = THIS_MODULE,
+ },
+ .probe = ks8851_probe,
+ .remove = __devexit_p(ks8851_remove),
+};
+
+static int __init ks8851_init(void)
+{
+ return platform_driver_register(&ks8851_platform_driver);
+}
+
+static void __exit ks8851_exit(void)
+{
+ platform_driver_unregister(&ks8851_platform_driver);
+}
+
+module_init(ks8851_init);
+module_exit(ks8851_exit);
+
+MODULE_DESCRIPTION("KS8851 MLL Network driver");
+MODULE_AUTHOR("David Choi <david.choi@micrel.com>");
+MODULE_LICENSE("GPL");
+module_param_named(message, msg_enable, int, 0);
+MODULE_PARM_DESC(message, "Message verbosity level (0=none, 31=all)");
+
--- linux-2.6.31-rc3/drivers/net/Makefile.orig 2009-09-08
09:11:31.000000000 -0700
+++ linux-2.6.31-rc3/drivers/net/Makefile 2009-07-28
13:49:12.000000000 -0700
@@ -88,6 +88,7 @@ obj-$(CONFIG_SKGE) += skge.o
obj-$(CONFIG_SKY2) += sky2.o
obj-$(CONFIG_SKFP) += skfp/
obj-$(CONFIG_KS8842) += ks8842.o
+obj-$(CONFIG_KS8851) += ks8851_mll.o
obj-$(CONFIG_VIA_RHINE) += via-rhine.o
obj-$(CONFIG_VIA_VELOCITY) += via-velocity.o
obj-$(CONFIG_ADAPTEC_STARFIRE) += starfire.o
--- linux-2.6.31-rc3/drivers/net/Kconfig.orig 2009-09-08
09:10:28.000000000 -0700
+++ linux-2.6.31-rc3/drivers/net/Kconfig 2009-09-08
09:09:38.000000000 -0700
@@ -1729,6 +1729,12 @@ config KS8842
help
This platform driver is for Micrel KSZ8842 chip.
+config KS8851
+ tristate "Micrel KSZ8851"
+ depends on HAS_IOMEM
+ help
+ This platform driver is for Micrel KSZ8851 MLL chip.
+
config VIA_RHINE
tristate "VIA Rhine support"
depends on NET_PCI && PCI
Regards,
David J. Choi
^ permalink raw reply
* [IB] 2.6.31-rc9: SW2HW_EQ failed on Dell R610
From: Christoph Lameter @ 2009-09-08 22:06 UTC (permalink / raw)
To: Roland Dreier; +Cc: netdev
The problem with the interrupts is not solved in rc9:
[ 7.747804] mlx4_core: Mellanox ConnectX core driver v0.01 (May 1,
2007)
[ 7.747806] mlx4_core: Initializing 0000:04:00.0
[ 7.747839] mlx4_core 0000:04:00.0: PCI INT A -> GSI 38 (level, low) ->
IRQ 38
[ 7.747850] mlx4_core 0000:04:00.0: setting latency timer to 64
[ 9.759505] mlx4_core 0000:04:00.0: irq 62 for MSI/MSI-X
[ 9.759513] mlx4_core 0000:04:00.0: irq 63 for MSI/MSI-X
[ 9.759520] mlx4_core 0000:04:00.0: irq 64 for MSI/MSI-X
[ 9.759527] mlx4_core 0000:04:00.0: irq 65 for MSI/MSI-X
[ 9.759533] mlx4_core 0000:04:00.0: irq 66 for MSI/MSI-X
[ 9.759540] mlx4_core 0000:04:00.0: irq 67 for MSI/MSI-X
[ 9.759547] mlx4_core 0000:04:00.0: irq 68 for MSI/MSI-X
[ 9.759555] mlx4_core 0000:04:00.0: irq 69 for MSI/MSI-X
[ 9.759561] mlx4_core 0000:04:00.0: irq 70 for MSI/MSI-X
[ 9.759569] mlx4_core 0000:04:00.0: irq 71 for MSI/MSI-X
[ 9.759576] mlx4_core 0000:04:00.0: irq 72 for MSI/MSI-X
[ 9.759583] mlx4_core 0000:04:00.0: irq 73 for MSI/MSI-X
[ 9.759590] mlx4_core 0000:04:00.0: irq 74 for MSI/MSI-X
[ 9.759596] mlx4_core 0000:04:00.0: irq 75 for MSI/MSI-X
[ 9.759603] mlx4_core 0000:04:00.0: irq 76 for MSI/MSI-X
[ 9.759611] mlx4_core 0000:04:00.0: irq 77 for MSI/MSI-X
[ 9.759617] mlx4_core 0000:04:00.0: irq 78 for MSI/MSI-X
[ 9.759624] mlx4_core 0000:04:00.0: irq 79 for MSI/MSI-X
[ 9.759631] mlx4_core 0000:04:00.0: irq 80 for MSI/MSI-X
[ 9.759638] mlx4_core 0000:04:00.0: irq 81 for MSI/MSI-X
[ 9.759645] mlx4_core 0000:04:00.0: irq 82 for MSI/MSI-X
[ 9.759652] mlx4_core 0000:04:00.0: irq 83 for MSI/MSI-X
[ 9.759658] mlx4_core 0000:04:00.0: irq 84 for MSI/MSI-X
[ 9.759666] mlx4_core 0000:04:00.0: irq 85 for MSI/MSI-X
[ 9.759672] mlx4_core 0000:04:00.0: irq 86 for MSI/MSI-X
[ 9.759679] mlx4_core 0000:04:00.0: irq 87 for MSI/MSI-X
[ 9.759686] mlx4_core 0000:04:00.0: irq 88 for MSI/MSI-X
[ 9.759692] mlx4_core 0000:04:00.0: irq 89 for MSI/MSI-X
[ 9.759699] mlx4_core 0000:04:00.0: irq 90 for MSI/MSI-X
[ 9.759706] mlx4_core 0000:04:00.0: irq 91 for MSI/MSI-X
[ 9.759712] mlx4_core 0000:04:00.0: irq 92 for MSI/MSI-X
[ 9.759720] mlx4_core 0000:04:00.0: irq 93 for MSI/MSI-X
[ 9.759726] mlx4_core 0000:04:00.0: irq 94 for MSI/MSI-X
[ 10.044580] mlx4_core 0000:04:00.0: SW2HW_EQ failed (-5)
[ 10.058011] mlx4_core 0000:04:00.0: Failed to initialize event queue
table, aborting.
[ 10.076589] mlx4_core 0000:04:00.0: PCI INT A disabled
[ 10.086805] mlx4_core: probe of 0000:04:00.0 failed with error -5
^ permalink raw reply
* Re: [PATCH] slub: fix slab_pad_check()
From: Paul E. McKenney @ 2009-09-08 22:20 UTC (permalink / raw)
To: Christoph Lameter
Cc: Eric Dumazet, Pekka Enberg, Zdenek Kabelac, Patrick McHardy,
Robin Holt, Linux Kernel Mailing List, Jesper Dangaard Brouer,
Linux Netdev List, Netfilter Developers
In-Reply-To: <alpine.DEB.1.10.0909081555410.26382@V090114053VZO-1>
On Tue, Sep 08, 2009 at 03:57:04PM -0400, Christoph Lameter wrote:
> On Fri, 4 Sep 2009, Paul E. McKenney wrote:
>
> > We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
> > five years, so I think we are plenty fine with what we have. So, as
> > you say, "as the need arises".
>
> These were the glory years where SLAB_DESTROY_BY_RCU was only used for
> anonymous vmas. Now Eric has picked it up for the net subsystem. You may
> see the RCU use proliferate.
>
> The kmem_cache_destroy rcu barriers did not matter until
> SLAB_DESTROY_BY_RCU spread.
Certainly it is true that increased use of RCU has resulted in new
requirements, which have in turn led to any number of changes over
the years.
Are you saying that people have already asked you for additional
variants of SLAB_DESTROY_BY_RCU? If so, please don't keep them a secret!
Otherwise, experience indicates that it is best to wait for the new uses,
because they usually aren't quite what one might expect.
Thanx, Paul
^ permalink raw reply
* UDP regression with packets rates < 10k per sec
From: Christoph Lameter @ 2009-09-08 22:38 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
Looks like we have a regression since 2.6.22 due to latency increases in
the network stack? The following is the result of measuring latencies for
UDP multicast traffic at packet rates of 10pps 100pps 1kpps 10kpps and
100k pps. Two system running "mcast -n1 -r<rate>" (mcast tool from
http://gentwo.org/ll).
Measurements in microseconds for one hop using bnx2 on Dell R610 (64 bit
2.6.31-rc9) and Dell 1950 (32 bit 2.6.22.19 3.3Ghz). Dell R610 RX usecs
tuned to 0. 32 bit tuned to 1 (NIC is flaky at 0).
Kernel 10pps 100pps 1kpps 10kpps 100kpps
---------------------------------------------------------------
2.6.22 (32bit) 30 29.5 29 30 41
2.6.31-rc9(64 bit) 64 63 46 30 40
The only minor improvement is at a rate of 100kpps. All rates
lower than 10k regress significantly.
Could there be something wrong with the bnx2 interrupt routing? They all
end up on cpu0 here. There are 8 of them in a system with 16 "processors".
How do those need to be configured? There are some sparse comments in
Documentation/networking/multiqueue.txt but the text does not say anything
about the irq routing.
^ permalink raw reply
* Re: [PATCH] slub: fix slab_pad_check()
From: Christoph Lameter @ 2009-09-08 22:41 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Eric Dumazet, Pekka Enberg, Zdenek Kabelac, Patrick McHardy,
Robin Holt, Linux Kernel Mailing List, Jesper Dangaard Brouer,
Linux Netdev List, Netfilter Developers
In-Reply-To: <20090908222036.GM6753@linux.vnet.ibm.com>
On Tue, 8 Sep 2009, Paul E. McKenney wrote:
> Are you saying that people have already asked you for additional
> variants of SLAB_DESTROY_BY_RCU? If so, please don't keep them a secret!
> Otherwise, experience indicates that it is best to wait for the new uses,
> because they usually aren't quite what one might expect.
No direct request but I have seen the network developers discover these
features and their caching benefits over the last year. It is likely that
they will try to push it into more components of the net subsystem.
^ permalink raw reply
* [PATCH] net: Fix sock_wfree() race
From: Eric Dumazet @ 2009-09-08 22:49 UTC (permalink / raw)
To: David S. Miller; +Cc: Jike Song, Parag Warudkar, linux-kernel, netdev
In-Reply-To: <4AA64A11.7090804@gmail.com>
Eric Dumazet a écrit :
> Jike Song a écrit :
>> On Tue, Sep 8, 2009 at 3:38 PM, Eric Dumazet<eric.dumazet@gmail.com> wrote:
>>> We decrement a refcnt while object already freed.
>>>
>>> (SLUB DEBUG poisons the zone with 0x6B pattern)
>>>
>>> You might add this patch to trigger a WARN_ON when refcnt >= 0x60000000U
>>> in sk_free() : We'll see the path trying to delete an already freed sock
>>>
>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>> index 7633422..1cb85ff 100644
>>> --- a/net/core/sock.c
>>> +++ b/net/core/sock.c
>>> @@ -1058,6 +1058,7 @@ static void __sk_free(struct sock *sk)
>>>
>>> void sk_free(struct sock *sk)
>>> {
>>> + WARN_ON(atomic_read(&sk->sk_wmem_alloc) >= 0x60000000U);
>>> /*
>>> * We substract one from sk_wmem_alloc and can know if
>>> * some packets are still in some tx queue.
>>>
>>>
>> The output of dmesg with this patch appllied is attached.
>>
>>
>
> Unfortunatly this WARN_ON was not triggered,
> maybe freeing comes from sock_wfree()
>
> Could you try this patch instead ?
>
> Thanks
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 7633422..30469dc 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1058,6 +1058,7 @@ static void __sk_free(struct sock *sk)
>
> void sk_free(struct sock *sk)
> {
> + WARN_ON(atomic_read(&sk->sk_wmem_alloc) >= 0x60000000U);
> /*
> * We substract one from sk_wmem_alloc and can know if
> * some packets are still in some tx queue.
> @@ -1220,6 +1221,7 @@ void sock_wfree(struct sk_buff *skb)
> struct sock *sk = skb->sk;
> int res;
>
> + WARN_ON(atomic_read(&sk->sk_wmem_alloc) >= 0x60000000U);
> /* In case it might be waiting for more memory. */
> res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
> if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
>
David, I believe problem could come from a race in sock_wfree()
It used to have two atomic ops.
One doing the atomic_sub(skb->truesize, &sk->sk_wmem_alloc);
then one sock_put() doing the atomic_dec_and_test(&sk->sk_refcnt)
Now, if two cpus are both :
CPU 1 calling sock_wfree()
CPU 2 calling the 'final' sock_put(),
CPU 1 doing sock_wfree() might call sk->sk_write_space(sk)
while CPU 2 is already freeing the socket.
Please note I did not test this patch, its very late here and I should get some sleep now...
Thanks
[PATCH] net: Fix sock_wfree() race
Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.
Fix is to call sk->sk_write_space(sk) only
while still holding a reference on sk.
Since doing this call is done before the
atomic_sub(truesize, &sk->sk_wmem_alloc), we should pass truesize as
a bias for possible sk_wmem_alloc evaluations.
Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/linux/sunrpc/svcsock.h | 2 +-
include/net/sock.h | 9 +++++++--
net/core/sock.c | 14 +++++++-------
net/core/stream.c | 2 +-
net/dccp/output.c | 4 ++--
net/ipv4/tcp_input.c | 2 +-
net/phonet/pep-gprs.c | 4 ++--
net/phonet/pep.c | 4 ++--
net/sunrpc/svcsock.c | 8 ++++----
net/sunrpc/xprtsock.c | 10 +++++-----
net/unix/af_unix.c | 12 ++++++------
11 files changed, 38 insertions(+), 33 deletions(-)
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 04dba23..f80ebff 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -23,7 +23,7 @@ struct svc_sock {
/* We keep the old state_change and data_ready CB's here */
void (*sk_ostate)(struct sock *);
void (*sk_odata)(struct sock *, int bytes);
- void (*sk_owspace)(struct sock *);
+ void (*sk_owspace)(struct sock *, unsigned int bias);
/* private TCP part */
u32 sk_reclen; /* length of record */
diff --git a/include/net/sock.h b/include/net/sock.h
index 950409d..eee3312 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -296,7 +296,7 @@ struct sock {
/* XXX 4 bytes hole on 64 bit */
void (*sk_state_change)(struct sock *sk);
void (*sk_data_ready)(struct sock *sk, int bytes);
- void (*sk_write_space)(struct sock *sk);
+ void (*sk_write_space)(struct sock *sk, unsigned int bias);
void (*sk_error_report)(struct sock *sk);
int (*sk_backlog_rcv)(struct sock *sk,
struct sk_buff *skb);
@@ -554,7 +554,7 @@ static inline int sk_stream_wspace(struct sock *sk)
return sk->sk_sndbuf - sk->sk_wmem_queued;
}
-extern void sk_stream_write_space(struct sock *sk);
+extern void sk_stream_write_space(struct sock *sk, unsigned int bias);
static inline int sk_stream_memory_free(struct sock *sk)
{
@@ -1433,6 +1433,11 @@ static inline int sock_writeable(const struct sock *sk)
return atomic_read(&sk->sk_wmem_alloc) < (sk->sk_sndbuf >> 1);
}
+static inline int sock_writeable_bias(const struct sock *sk, unsigned int bias)
+{
+ return (atomic_read(&sk->sk_wmem_alloc) - bias) < (sk->sk_sndbuf >> 1);
+}
+
static inline gfp_t gfp_any(void)
{
return in_softirq() ? GFP_ATOMIC : GFP_KERNEL;
diff --git a/net/core/sock.c b/net/core/sock.c
index 30d5446..da672c0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -512,7 +512,7 @@ set_sndbuf:
* Wake up sending tasks if we
* upped the value.
*/
- sk->sk_write_space(sk);
+ sk->sk_write_space(sk, 0);
break;
case SO_SNDBUFFORCE:
@@ -1230,10 +1230,10 @@ void sock_wfree(struct sk_buff *skb)
struct sock *sk = skb->sk;
int res;
- /* In case it might be waiting for more memory. */
- res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
- sk->sk_write_space(sk);
+ sk->sk_write_space(sk, skb->truesize);
+
+ res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
/*
* if sk_wmem_alloc reached 0, we are last user and should
* free this sock, as sk_free() call could not do it.
@@ -1776,20 +1776,20 @@ static void sock_def_readable(struct sock *sk, int len)
read_unlock(&sk->sk_callback_lock);
}
-static void sock_def_write_space(struct sock *sk)
+static void sock_def_write_space(struct sock *sk, unsigned int bias)
{
read_lock(&sk->sk_callback_lock);
/* Do not wake up a writer until he can make "significant"
* progress. --DaveM
*/
- if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
+ if (((atomic_read(&sk->sk_wmem_alloc) - bias) << 1) <= sk->sk_sndbuf) {
if (sk_has_sleeper(sk))
wake_up_interruptible_sync_poll(sk->sk_sleep, POLLOUT |
POLLWRNORM | POLLWRBAND);
/* Should agree with poll, otherwise some programs break */
- if (sock_writeable(sk))
+ if (sock_writeable_bias(sk, bias))
sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
}
diff --git a/net/core/stream.c b/net/core/stream.c
index a37debf..df720e9 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -25,7 +25,7 @@
*
* FIXME: write proper description
*/
-void sk_stream_write_space(struct sock *sk)
+void sk_stream_write_space(struct sock *sk, unsigned int bias)
{
struct socket *sock = sk->sk_socket;
diff --git a/net/dccp/output.c b/net/dccp/output.c
index c96119f..cf0635e 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -192,14 +192,14 @@ unsigned int dccp_sync_mss(struct sock *sk, u32 pmtu)
EXPORT_SYMBOL_GPL(dccp_sync_mss);
-void dccp_write_space(struct sock *sk)
+void dccp_write_space(struct sock *sk, unsigned int bias)
{
read_lock(&sk->sk_callback_lock);
if (sk_has_sleeper(sk))
wake_up_interruptible(sk->sk_sleep);
/* Should agree with poll, otherwise some programs break */
- if (sock_writeable(sk))
+ if (sock_writeable_bias(sk, bias))
sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
read_unlock(&sk->sk_callback_lock);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index af6d6fa..bde1437 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4818,7 +4818,7 @@ static void tcp_new_space(struct sock *sk)
tp->snd_cwnd_stamp = tcp_time_stamp;
}
- sk->sk_write_space(sk);
+ sk->sk_write_space(sk, 0);
}
static void tcp_check_space(struct sock *sk)
diff --git a/net/phonet/pep-gprs.c b/net/phonet/pep-gprs.c
index d183509..cc36c31 100644
--- a/net/phonet/pep-gprs.c
+++ b/net/phonet/pep-gprs.c
@@ -38,7 +38,7 @@ struct gprs_dev {
struct sock *sk;
void (*old_state_change)(struct sock *);
void (*old_data_ready)(struct sock *, int);
- void (*old_write_space)(struct sock *);
+ void (*old_write_space)(struct sock *, unsigned int);
struct net_device *dev;
};
@@ -157,7 +157,7 @@ static void gprs_data_ready(struct sock *sk, int len)
}
}
-static void gprs_write_space(struct sock *sk)
+static void gprs_write_space(struct sock *sk, unsigned int bias)
{
struct gprs_dev *gp = sk->sk_user_data;
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index b8252d2..d76e2ea 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -268,7 +268,7 @@ static int pipe_rcv_status(struct sock *sk, struct sk_buff *skb)
return -EOPNOTSUPP;
}
if (wake)
- sk->sk_write_space(sk);
+ sk->sk_write_space(sk, 0);
return 0;
}
@@ -394,7 +394,7 @@ static int pipe_do_rcv(struct sock *sk, struct sk_buff *skb)
case PNS_PIPE_ENABLED_IND:
if (!pn_flow_safe(pn->tx_fc)) {
atomic_set(&pn->tx_credits, 1);
- sk->sk_write_space(sk);
+ sk->sk_write_space(sk, 0);
}
if (sk->sk_state == TCP_ESTABLISHED)
break; /* Nothing to do */
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 23128ee..8c1642c 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -380,7 +380,7 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
sock->sk->sk_sndbuf = snd * 2;
sock->sk->sk_rcvbuf = rcv * 2;
sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
- sock->sk->sk_write_space(sock->sk);
+ sock->sk->sk_write_space(sock->sk, 0);
release_sock(sock->sk);
#endif
}
@@ -405,7 +405,7 @@ static void svc_udp_data_ready(struct sock *sk, int count)
/*
* INET callback when space is newly available on the socket.
*/
-static void svc_write_space(struct sock *sk)
+static void svc_write_space(struct sock *sk, unsigned int bias)
{
struct svc_sock *svsk = (struct svc_sock *)(sk->sk_user_data);
@@ -422,13 +422,13 @@ static void svc_write_space(struct sock *sk)
}
}
-static void svc_tcp_write_space(struct sock *sk)
+static void svc_tcp_write_space(struct sock *sk, unsigned int bias)
{
struct socket *sock = sk->sk_socket;
if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) && sock)
clear_bit(SOCK_NOSPACE, &sock->flags);
- svc_write_space(sk);
+ svc_write_space(sk, bias);
}
/*
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 83c73c4..11e4d35 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -262,7 +262,7 @@ struct sock_xprt {
*/
void (*old_data_ready)(struct sock *, int);
void (*old_state_change)(struct sock *);
- void (*old_write_space)(struct sock *);
+ void (*old_write_space)(struct sock *, unsigned int);
void (*old_error_report)(struct sock *);
};
@@ -1491,12 +1491,12 @@ static void xs_write_space(struct sock *sk)
* progress, otherwise we'll waste resources thrashing kernel_sendmsg
* with a bunch of small requests.
*/
-static void xs_udp_write_space(struct sock *sk)
+static void xs_udp_write_space(struct sock *sk, unsigned int bias)
{
read_lock(&sk->sk_callback_lock);
/* from net/core/sock.c:sock_def_write_space */
- if (sock_writeable(sk))
+ if (sock_writeable_bias(sk, bias))
xs_write_space(sk);
read_unlock(&sk->sk_callback_lock);
@@ -1512,7 +1512,7 @@ static void xs_udp_write_space(struct sock *sk)
* progress, otherwise we'll waste resources thrashing kernel_sendmsg
* with a bunch of small requests.
*/
-static void xs_tcp_write_space(struct sock *sk)
+static void xs_tcp_write_space(struct sock *sk, unsigned int bias)
{
read_lock(&sk->sk_callback_lock);
@@ -1535,7 +1535,7 @@ static void xs_udp_do_set_buffer_size(struct rpc_xprt *xprt)
if (transport->sndsize) {
sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
sk->sk_sndbuf = transport->sndsize * xprt->max_reqs * 2;
- sk->sk_write_space(sk);
+ sk->sk_write_space(sk, 0);
}
}
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index fc3ebb9..9f90ead 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -306,15 +306,15 @@ found:
return s;
}
-static inline int unix_writable(struct sock *sk)
+static inline int unix_writable(struct sock *sk, unsigned int bias)
{
- return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+ return ((atomic_read(&sk->sk_wmem_alloc) - bias) << 2) <= sk->sk_sndbuf;
}
-static void unix_write_space(struct sock *sk)
+static void unix_write_space(struct sock *sk, unsigned int bias)
{
read_lock(&sk->sk_callback_lock);
- if (unix_writable(sk)) {
+ if (unix_writable(sk, bias)) {
if (sk_has_sleeper(sk))
wake_up_interruptible_sync(sk->sk_sleep);
sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
@@ -2010,7 +2010,7 @@ static unsigned int unix_poll(struct file *file, struct socket *sock, poll_table
* we set writable also when the other side has shut down the
* connection. This prevents stuck sockets.
*/
- if (unix_writable(sk))
+ if (unix_writable(sk, 0))
mask |= POLLOUT | POLLWRNORM | POLLWRBAND;
return mask;
@@ -2048,7 +2048,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
}
/* writable? */
- writable = unix_writable(sk);
+ writable = unix_writable(sk, 0);
if (writable) {
other = unix_peer_get(sk);
if (other) {
^ permalink raw reply related
* Re: UDP regression with packets rates < 10k per sec
From: Eric Dumazet @ 2009-09-08 22:52 UTC (permalink / raw)
To: Christoph Lameter; +Cc: netdev
In-Reply-To: <alpine.DEB.1.10.0909081820030.7733@V090114053VZO-1>
Christoph Lameter a écrit :
> Looks like we have a regression since 2.6.22 due to latency increases in
> the network stack? The following is the result of measuring latencies for
> UDP multicast traffic at packet rates of 10pps 100pps 1kpps 10kpps and
> 100k pps. Two system running "mcast -n1 -r<rate>" (mcast tool from
> http://gentwo.org/ll).
>
> Measurements in microseconds for one hop using bnx2 on Dell R610 (64 bit
> 2.6.31-rc9) and Dell 1950 (32 bit 2.6.22.19 3.3Ghz). Dell R610 RX usecs
> tuned to 0. 32 bit tuned to 1 (NIC is flaky at 0).
>
> Kernel 10pps 100pps 1kpps 10kpps 100kpps
> ---------------------------------------------------------------
> 2.6.22 (32bit) 30 29.5 29 30 41
> 2.6.31-rc9(64 bit) 64 63 46 30 40
>
> The only minor improvement is at a rate of 100kpps. All rates
> lower than 10k regress significantly.
>
> Could there be something wrong with the bnx2 interrupt routing? They all
> end up on cpu0 here. There are 8 of them in a system with 16 "processors".
> How do those need to be configured? There are some sparse comments in
> Documentation/networking/multiqueue.txt but the text does not say anything
> about the irq routing.
>
>
Hi Christoph
In order to reproduce this here, could you tell me if you use
Producer linux-2.6.22 -> Receiver 2.6.22
Producer linux-2.6.31 -> Receiver 2.6.31
Or a mix of
Producer linux-2.6.31 -> Receiver 2.6.22
Producer linux-2.6.22 -> Receiver 2.6.31-rc9
Not clear what is your exact setup
Thanks
^ permalink raw reply
* Re: [PATCH] slub: fix slab_pad_check()
From: Paul E. McKenney @ 2009-09-08 22:59 UTC (permalink / raw)
To: Christoph Lameter
Cc: Eric Dumazet, Pekka Enberg, Zdenek Kabelac, Patrick McHardy,
Robin Holt, Linux Kernel Mailing List, Jesper Dangaard Brouer,
Linux Netdev List, Netfilter Developers
In-Reply-To: <alpine.DEB.1.10.0909081839090.7733@V090114053VZO-1>
On Tue, Sep 08, 2009 at 06:41:14PM -0400, Christoph Lameter wrote:
> On Tue, 8 Sep 2009, Paul E. McKenney wrote:
>
> > Are you saying that people have already asked you for additional
> > variants of SLAB_DESTROY_BY_RCU? If so, please don't keep them a secret!
> > Otherwise, experience indicates that it is best to wait for the new uses,
> > because they usually aren't quite what one might expect.
>
> No direct request but I have seen the network developers discover these
> features and their caching benefits over the last year. It is likely that
> they will try to push it into more components of the net subsystem.
So if they push it far enough, they might well decide that they need
a SLAB_DESTROY_BY_RCU_BH, for example. Looks like seven bits left,
so unless I am missing something, should not be a huge problem should
this need arise.
Thanx, Paul
^ permalink raw reply
* [PATCH] ath5k: do not free irq after resume when card has been removed
From: Thadeu Lima de Souza Cascardo @ 2009-09-09 0:52 UTC (permalink / raw)
To: ath5k-devel
Cc: linux-wireless, netdev, linux-kernel, me, lrodriguez, mickflemm,
jirislaby, linville, johannes, Thadeu Lima de Souza Cascardo
ath5k will try to request irq when resuming and fails if the device
(like a PCMCIA card) has been removed. The driver remove function will,
then, be called, trying to free the failed requested irq, resulting in
a warning.
This solves this issue defining a new flag for the status bitmap to
indicate when irq has been successfully requested and does not try to
release it if not.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
---
drivers/net/wireless/ath/ath5k/base.c | 13 +++++++++++--
drivers/net/wireless/ath/ath5k/base.h | 3 ++-
2 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/ath/ath5k/base.c b/drivers/net/wireless/ath/ath5k/base.c
index 029c1bc..c5e2d5b 100644
--- a/drivers/net/wireless/ath/ath5k/base.c
+++ b/drivers/net/wireless/ath/ath5k/base.c
@@ -553,6 +553,7 @@ ath5k_pci_probe(struct pci_dev *pdev,
ATH5K_ERR(sc, "request_irq failed\n");
goto err_free;
}
+ __set_bit(ATH_STAT_IRQREQUESTED, sc->status);
/* Initialize device */
sc->ah = ath5k_hw_attach(sc, id->driver_data);
@@ -628,6 +629,7 @@ ath5k_pci_probe(struct pci_dev *pdev,
err_ah:
ath5k_hw_detach(sc->ah);
err_irq:
+ __clear_bit(ATH_STAT_IRQREQUESTED, sc->status);
free_irq(pdev->irq, sc);
err_free:
ieee80211_free_hw(hw);
@@ -650,7 +652,10 @@ ath5k_pci_remove(struct pci_dev *pdev)
ath5k_debug_finish_device(sc);
ath5k_detach(pdev, hw);
ath5k_hw_detach(sc->ah);
- free_irq(pdev->irq, sc);
+ if (test_bit(ATH_STAT_IRQREQUESTED, sc->status)) {
+ __clear_bit(ATH_STAT_IRQREQUESTED, sc->status);
+ free_irq(pdev->irq, sc);
+ }
pci_iounmap(pdev, sc->iobase);
pci_release_region(pdev, 0);
pci_disable_device(pdev);
@@ -666,7 +671,10 @@ ath5k_pci_suspend(struct pci_dev *pdev, pm_message_t state)
ath5k_led_off(sc);
- free_irq(pdev->irq, sc);
+ if (test_bit(ATH_STAT_IRQREQUESTED, sc->status)) {
+ __clear_bit(ATH_STAT_IRQREQUESTED, sc->status);
+ free_irq(pdev->irq, sc);
+ }
pci_save_state(pdev);
pci_disable_device(pdev);
pci_set_power_state(pdev, PCI_D3hot);
@@ -699,6 +707,7 @@ ath5k_pci_resume(struct pci_dev *pdev)
ATH5K_ERR(sc, "request_irq failed\n");
goto err_no_irq;
}
+ __set_bit(ATH_STAT_IRQREQUESTED, sc->status);
ath5k_led_enable(sc);
return 0;
diff --git a/drivers/net/wireless/ath/ath5k/base.h b/drivers/net/wireless/ath/ath5k/base.h
index f9b7f2f..4a71437 100644
--- a/drivers/net/wireless/ath/ath5k/base.h
+++ b/drivers/net/wireless/ath/ath5k/base.h
@@ -137,12 +137,13 @@ struct ath5k_softc {
size_t desc_len; /* size of TX/RX descriptors */
u16 cachelsz; /* cache line size */
- DECLARE_BITMAP(status, 5);
+ DECLARE_BITMAP(status, 6);
#define ATH_STAT_INVALID 0 /* disable hardware accesses */
#define ATH_STAT_MRRETRY 1 /* multi-rate retry support */
#define ATH_STAT_PROMISC 2
#define ATH_STAT_LEDSOFT 3 /* enable LED gpio status */
#define ATH_STAT_STARTED 4 /* opened & irqs enabled */
+#define ATH_STAT_IRQREQUESTED 5 /* irq requested */
unsigned int filter_flags; /* HW flags, AR5K_RX_FILTER_* */
unsigned int curmode; /* current phy mode */
--
1.6.3.3
^ permalink raw reply related
* Re: [IB] 2.6.31-rc9: SW2HW_EQ failed on Dell R610
From: Roland Dreier @ 2009-09-09 3:11 UTC (permalink / raw)
To: Christoph Lameter; +Cc: netdev
In-Reply-To: <alpine.DEB.1.10.0909081805110.7733@V090114053VZO-1>
> The problem with the interrupts is not solved in rc9:
Yes, by the time we got this all resolved, it was already rc8 and I
though 2.6.31 was right around the corner ... the issue didn't seem
severe enough to stick in so late in the cycle, so I queued the patch
for .32 with a cc to stable to get in 2.6.31.1.
The workaround of limiting possible cpus to 16 should be OK for the time
between 2.6.31 and 2.6.31.1.
- R.
^ permalink raw reply
* Re: [PATCH] ath5k: do not free irq after resume when card has been removed
From: Bob Copeland @ 2009-09-09 3:32 UTC (permalink / raw)
To: Thadeu Lima de Souza Cascardo
Cc: ath5k-devel-xDcbHBWguxEUs3QNXV6qNA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
lrodriguez-DlyHzToyqoxBDgjK7y7TUQ,
mickflemm-Re5JQEeQqe8AvxtiuMwx3w,
jirislaby-Re5JQEeQqe8AvxtiuMwx3w, linville-2XuSBdqkA4R54TAoqtyWWQ,
johannes-cdvu00un1VgdHxzADdlk8Q
In-Reply-To: <1252457551-4909-1-git-send-email-cascardo-DmMZpsCg3uxeGPcbtGPokg@public.gmane.org>
On Tue, Sep 08, 2009 at 09:52:31PM -0300, Thadeu Lima de Souza Cascardo wrote:
> ath5k will try to request irq when resuming and fails if the device
> (like a PCMCIA card) has been removed.
That's not true, ath5k no longer requests an irq when resuming.
> This solves this issue defining a new flag for the status bitmap to
> indicate when irq has been successfully requested and does not try to
> release it if not.
I'd rather not fix it with a status bit. What kernel is this against?
--
Bob Copeland %% www.bobcopeland.com
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* r8169 ethernet hangs after a pm-suspend (and resume)
From: Alex Bennee @ 2009-09-09 7:13 UTC (permalink / raw)
To: lkml, Francois Romieu, netdev
Hi,
I've just recently gotten suspend working on my system. Unfortunately
after the resume event I loose access to the network.
As far as the system is concerned the network is configured properly
but every attempt to ping local nodes fails with "Host not reachable".
If also seen an oops or two but I don't know id that is related:
[ 289.816066] ------------[ cut here ]------------
[ 289.816077] WARNING: at net/sched/sch_generic.c:246
dev_watchdog+0x132/0x1da()
[ 289.816080] Hardware name: System Product Name
[ 289.816083] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed
out
[ 289.816085] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat
joydev usb_storage usbhid usb_libusual bridge stp llc bnep rfcomm
l2cap bluetooth ipv6 snd_pcm_oss snd_mixer_oss snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device kvm_intel kvm acpi_cpufreq
snd_hda_codec_analog uhci_hcd usbcore snd_hda_intel snd_hda_codec
firewire_ohci snd_hwdep snd_pcm snd_timer snd firewire_core crc_itu_t
soundcore snd_page_alloc ide_cd_mod pcspkr evdev cdrom thermal
processor nls_base unix [last unloaded: ehci_hcd]
[ 289.816135] Pid: 0, comm: swapper Not tainted
2.6.31-rc9-ajb-00012-g3ff323f-dirty #84
[ 289.816138] Call Trace:
[ 289.816140] <IRQ> [<ffffffff812aef27>] ? dev_watchdog+0x132/0x1da
[ 289.816152] [<ffffffff8103eb72>] warn_slowpath_common+0x7c/0xa9
[ 289.816157] [<ffffffff8103ec1e>] warn_slowpath_fmt+0x69/0x6b
[ 289.816165] [<ffffffffa0124cbb>] ? uhci_scan_schedule+0x194/0x86a
[uhci_hcd]
[ 289.816169] [<ffffffff81048fbc>] ? lock_timer_base+0x2b/0x4f
[ 289.816174] [<ffffffff81049699>] ? mod_timer+0x111/0x123
[ 289.816180] [<ffffffffa0125d9a>] ?
uhci_hub_status_data+0x16e/0x17d [uhci_hcd]
[ 289.816185] [<ffffffff8129d98d>] ? netdev_drivername+0x48/0x4f
[ 289.816189] [<ffffffff812aef27>] dev_watchdog+0x132/0x1da
[ 289.816211] [<ffffffffa00f0233>] ?
usb_hcd_poll_rh_status+0x144/0x153 [usbcore]
[ 289.816215] [<ffffffff812aedf5>] ? dev_watchdog+0x0/0x1da
[ 289.816220] [<ffffffff81048d76>] run_timer_softirq+0x198/0x20d
[ 289.816226] [<ffffffff8101d0c6>] ? lapic_next_event+0x1d/0x21
[ 289.816231] [<ffffffff8104464f>] __do_softirq+0xd6/0x19a
[ 289.816235] [<ffffffff8100c19c>] call_softirq+0x1c/0x28
[ 289.816239] [<ffffffff8100d51d>] do_softirq+0x39/0x77
[ 289.816243] [<ffffffff8104430c>] irq_exit+0x44/0x7e
[ 289.816248] [<ffffffff8130b164>]
smp_apic_timer_interrupt+0x8d/0x9b
[ 289.816253] [<ffffffff8100bb73>] apic_timer_interrupt+0x13/0x20
[ 289.816256] <EOI> [<ffffffff810117ac>] ? mwait_idle+0xb9/0xf0
[ 289.816264] [<ffffffff81309645>] ?
atomic_notifier_call_chain+0x13/0x15
[ 289.816268] [<ffffffff8100a30a>] ? cpu_idle+0x57/0x98
[ 289.816273] [<ffffffff812f5422>] ? rest_init+0x66/0x68
[ 289.816278] [<ffffffff815319da>] ? start_kernel+0x343/0x34e
[ 289.816283] [<ffffffff8153103a>] ?
x86_64_start_reservations+0xaa/0xae
[ 289.816287] [<ffffffff8153111f>] ? x86_64_start_kernel+0xe1/0xe8
[ 289.816290] ---[ end trace 01c3a2a7a5f34536 ]---
[ 290.635368] r8169: eth0: link up
[ 314.635844] r8169: eth0: link up
I'm currently running 2.6.31-rc9-ajb-00012-g3ff323f-dirty and am
willing to test any patches that might be going.
My card is:
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
Subsystem: ASUSTeK Computer Inc. Device 81aa
Flags: bus master, fast devsel, latency 0, IRQ 25
I/O ports at e800 [size=256]
Memory at dffff000 (64-bit, non-prefetchable) [size=4K]
Memory at deff0000 (64-bit, prefetchable) [size=64K]
Expansion ROM at dffc0000 [disabled] [size=128K]
Capabilities: [40] Power Management version 2
Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Count=1/1 Enable+
Capabilities: [70] Express Endpoint, MSI 08
Capabilities: [b0] MSI-X: Enable- Mask- TabSize=2
Capabilities: [d0] Vital Product Data <?>
Kernel driver in use: r8169
--
Alex, homepage: http://www.bennee.com/~alex/
http://www.half-llama.co.uk
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox