* Re: [PATCH] epic100: Test __BIG_ENDIAN instead of (non-existent) CONFIG_BIG_ENDIAN
From: David Miller @ 2010-06-03 10:29 UTC (permalink / raw)
To: jeff; +Cc: rdreier, netdev
In-Reply-To: <4C06F27B.2080704@garzik.org>
From: Jeff Garzik <jeff@garzik.org>
Date: Wed, 02 Jun 2010 20:08:27 -0400
> On 06/02/2010 04:36 PM, Roland Dreier wrote:
>> Probably no one has used this driver on big-endian systems, since it
>> was
>> setting up descriptor swapping if CONFIG_BIG_ENDIAN is set, which it
>> never is, since that symbol is not mentioned anywhere else in the
>> kernel
>> source. Switch this test to a check for __BIG_ENDIAN so it has a
>> chance
>> at working.
>>
>> Signed-off-by: Roland Dreier<rolandd@cisco.com>
>
> Acked-by: Jeff Garzik <jgarzik@redhat.com>
Applied.
^ permalink raw reply
* Re: [PATCH] sfc: Store port number in net_device::dev_id
From: David Miller @ 2010-06-03 10:29 UTC (permalink / raw)
To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1275511200.2115.23.camel@achroite.uk.solarflarecom.com>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 02 Jun 2010 21:39:56 +0100
> This exposes the port number to userland through sysfs.
>
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> Following on from the earlier discussion of dev_id, here's the necessary
> change for sfc. This depends on commit dd8f61d "sfc: Get port number
> from CS_PORT_NUM, not PCI function number". Please consider including
> both of these in net-2.6.
Both added to net-2.6
^ permalink raw reply
* Re: [net-2.6 PATCH] ixgbe: return IXGBE_ERR_RAR_INDEX when out of range
From: David Miller @ 2010-06-03 10:29 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, xma, donald.c.skidmore
In-Reply-To: <20100602224405.13236.13183.stgit@localhost.localdomain>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 02 Jun 2010 15:44:05 -0700
> Based on original patch from Shirley Ma <xma@us.ibm.com>
> Return IXGBE_ERR_RAR_INDEX when RAR index is out of range, instead of
> returning IXGBE_SUCCESS.
>
> CC: Shirley Ma <xma@us.ibm.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Applied.
^ permalink raw reply
* Re: [net-2.6 PATCH] e1000e: change logical negate to bitwise
From: David Miller @ 2010-06-03 10:30 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, error27, bruce.w.allan
In-Reply-To: <20100602234314.13472.35872.stgit@localhost.localdomain>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 02 Jun 2010 16:43:15 -0700
> From: Dan Carpenter <error27@gmail.com>
>
> The bitwise negate is intended here. With the logical negate the
> condition is always false.
>
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> Acked-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Applied.
^ permalink raw reply
* Re: [PATCH] act_pedit: access skb->data safely
From: David Miller @ 2010-06-03 10:30 UTC (permalink / raw)
To: xiaosuo; +Cc: hadi, netdev
In-Reply-To: <1275490502-22608-1-git-send-email-xiaosuo@gmail.com>
From: Changli Gao <xiaosuo@gmail.com>
Date: Wed, 2 Jun 2010 22:55:02 +0800
> access skb->data safely
>
> we should use skb_header_pointer() and skb_store_bits() to access skb->data to
> handle small or non-linear skbs.
>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Applied.
^ permalink raw reply
* [PATCH] syncookies: remove Kconfig text line about disabled-by-default
From: Florian Westphal @ 2010-06-03 10:42 UTC (permalink / raw)
To: netdev; +Cc: Florian Westphal
syncookies default to on since
e994b7c901ded7200b525a707c6da71f2cf6d4bb
(tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL).
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv4/Kconfig | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 8e3a1fd..7c3a7d1 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -303,7 +303,7 @@ config ARPD
If unsure, say N.
config SYN_COOKIES
- bool "IP: TCP syncookie support (disabled per default)"
+ bool "IP: TCP syncookie support"
---help---
Normal TCP/IP networking is open to an attack known as "SYN
flooding". This denial-of-service attack prevents legitimate remote
@@ -328,13 +328,13 @@ config SYN_COOKIES
server is really overloaded. If this happens frequently better turn
them off.
- If you say Y here, note that SYN cookies aren't enabled by default;
- you can enable them by saying Y to "/proc file system support" and
+ If you say Y here, you can disable SYN cookies at run time by
+ saying Y to "/proc file system support" and
"Sysctl support" below and executing the command
- echo 1 >/proc/sys/net/ipv4/tcp_syncookies
+ echo 0 > /proc/sys/net/ipv4/tcp_syncookies
- at boot time after the /proc file system has been mounted.
+ after the /proc file system has been mounted.
If unsure, say N.
--
1.6.4.4
^ permalink raw reply related
* [PATCH net-next-2.6] syncookies: make v4/v6 synflood warning behaviour the same
From: Florian Westphal @ 2010-06-03 10:43 UTC (permalink / raw)
To: netdev; +Cc: Florian Westphal
both syn_flood_warning functions print a message, but
ipv4 version only prints a warning if CONFIG_SYN_COOKIES=y.
Make the v4 one behave like the v6 one.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv4/tcp_ipv4.c | 24 +++++++++++++-----------
1 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 202cf09..a13f881 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -793,19 +793,20 @@ static void tcp_v4_reqsk_destructor(struct request_sock *req)
kfree(inet_rsk(req)->opt);
}
-#ifdef CONFIG_SYN_COOKIES
-static void syn_flood_warning(struct sk_buff *skb)
+static void syn_flood_warning(const struct sk_buff *skb)
{
- static unsigned long warntime;
+ const char *msg;
- if (time_after(jiffies, (warntime + HZ * 60))) {
- warntime = jiffies;
- printk(KERN_INFO
- "possible SYN flooding on port %d. Sending cookies.\n",
- ntohs(tcp_hdr(skb)->dest));
- }
-}
+#ifdef CONFIG_SYN_COOKIES
+ if (sysctl_tcp_syncookies)
+ msg = "Sending cookies";
+ else
#endif
+ msg = "Dropping request";
+
+ pr_info("TCP: Possible SYN flooding on port %d. %s.\n",
+ ntohs(tcp_hdr(skb)->dest), msg);
+}
/*
* Save and compile IPv4 options into the request_sock if needed.
@@ -1243,6 +1244,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
* evidently real one.
*/
if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
+ if (net_ratelimit())
+ syn_flood_warning(skb);
#ifdef CONFIG_SYN_COOKIES
if (sysctl_tcp_syncookies) {
want_cookie = 1;
@@ -1328,7 +1331,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
if (want_cookie) {
#ifdef CONFIG_SYN_COOKIES
- syn_flood_warning(skb);
req->cookie_ts = tmp_opt.tstamp_ok;
#endif
isn = cookie_v4_init_sequence(sk, skb, &req->mss);
--
1.6.4.4
^ permalink raw reply related
* [PATCH net-next-2.6] syncookies: avoid unneeded tcp header flag double check
From: Florian Westphal @ 2010-06-03 10:43 UTC (permalink / raw)
To: netdev; +Cc: Florian Westphal
caller: if (!th->rst && !th->syn && th->ack)
callee: if (!th->ack)
make the caller only check for !syn (common for 3whs), and move
the !rst / ack test to the callee.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv4/syncookies.c | 2 +-
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/syncookies.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 5c24db4..c9dac86 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -266,7 +266,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
struct rtable *rt;
__u8 rcv_wscale;
- if (!sysctl_tcp_syncookies || !th->ack)
+ if (!sysctl_tcp_syncookies || !th->ack || th->rst)
goto out;
if (tcp_synq_no_recent_overflow(sk) ||
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a13f881..6558dfd 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1506,7 +1506,7 @@ static struct sock *tcp_v4_hnd_req(struct sock *sk, struct sk_buff *skb)
}
#ifdef CONFIG_SYN_COOKIES
- if (!th->rst && !th->syn && th->ack)
+ if (!th->syn)
sk = cookie_v4_check(sk, skb, &(IPCB(skb)->opt));
#endif
return sk;
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 1238370..9fcb3ec 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -174,7 +174,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
struct dst_entry *dst;
__u8 rcv_wscale;
- if (!sysctl_tcp_syncookies || !th->ack)
+ if (!sysctl_tcp_syncookies || !th->ack || th->rst)
goto out;
if (tcp_synq_no_recent_overflow(sk) ||
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index e487080..5887141 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1157,7 +1157,7 @@ static struct sock *tcp_v6_hnd_req(struct sock *sk,struct sk_buff *skb)
}
#ifdef CONFIG_SYN_COOKIES
- if (!th->rst && !th->syn && th->ack)
+ if (!th->syn)
sk = cookie_v6_check(sk, skb);
#endif
return sk;
--
1.6.4.4
^ permalink raw reply related
* [PATCH net-next-2.6] syncookies: update mss tables
From: Florian Westphal @ 2010-06-03 10:43 UTC (permalink / raw)
To: netdev; +Cc: Florian Westphal
- ipv6 msstab: account for ipv6 header size
- ipv4 msstab: add mss for Jumbograms.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/ipv4/syncookies.c | 38 +++++++++++++++++++-------------------
net/ipv6/syncookies.c | 39 +++++++++++++++------------------------
2 files changed, 34 insertions(+), 43 deletions(-)
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index c9dac86..a7cbcc4 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -138,23 +138,23 @@ static __u32 check_tcp_syn_cookie(__u32 cookie, __be32 saddr, __be32 daddr,
}
/*
- * This table has to be sorted and terminated with (__u16)-1.
- * XXX generate a better table.
- * Unresolved Issues: HIPPI with a 64k MSS is not well supported.
+ * MSS Values are taken from the 2009 paper
+ * 'Measuring TCP Maximum Segment Size' by S. Alcock and R. Nelson:
+ * - values 1440 to 1460 accounted for 80% of observed mss values
+ * - values outside the 536-1460 range are rare (<0.2%).
+ *
+ * Table must be sorted.
*/
static __u16 const msstab[] = {
- 64 - 1,
- 256 - 1,
- 512 - 1,
- 536 - 1,
- 1024 - 1,
- 1440 - 1,
- 1460 - 1,
- 4312 - 1,
- (__u16)-1
+ 64,
+ 512,
+ 536,
+ 1024,
+ 1440,
+ 1460,
+ 4312,
+ 8960,
};
-/* The number doesn't include the -1 terminator */
-#define NUM_MSS (ARRAY_SIZE(msstab) - 1)
/*
* Generate a syncookie. mssp points to the mss, which is returned
@@ -169,10 +169,10 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
tcp_synq_overflow(sk);
- /* XXX sort msstab[] by probability? Binary search? */
- for (mssind = 0; mss > msstab[mssind + 1]; mssind++)
- ;
- *mssp = msstab[mssind] + 1;
+ for (mssind = ARRAY_SIZE(msstab) - 1; mssind ; mssind--)
+ if (mss >= msstab[mssind])
+ break;
+ *mssp = msstab[mssind];
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
@@ -202,7 +202,7 @@ static inline int cookie_check(struct sk_buff *skb, __u32 cookie)
jiffies / (HZ * 60),
COUNTER_TRIES);
- return mssind < NUM_MSS ? msstab[mssind] + 1 : 0;
+ return mssind < ARRAY_SIZE(msstab) ? msstab[mssind] : 0;
}
static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb,
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 9fcb3ec..70d330f 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -27,28 +27,17 @@ extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
#define COOKIEBITS 24 /* Upper bits store count */
#define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
-/*
- * This table has to be sorted and terminated with (__u16)-1.
- * XXX generate a better table.
- * Unresolved Issues: HIPPI with a 64k MSS is not well supported.
- *
- * Taken directly from ipv4 implementation.
- * Should this list be modified for ipv6 use or is it close enough?
- * rfc 2460 8.3 suggests mss values 20 bytes less than ipv4 counterpart
- */
+/* Table must be sorted. */
static __u16 const msstab[] = {
- 64 - 1,
- 256 - 1,
- 512 - 1,
- 536 - 1,
- 1024 - 1,
- 1440 - 1,
- 1460 - 1,
- 4312 - 1,
- (__u16)-1
+ 64,
+ 512,
+ 536,
+ 1280 - 60,
+ 1480 - 60,
+ 1500 - 60,
+ 4460 - 60,
+ 9000 - 60,
};
-/* The number doesn't include the -1 terminator */
-#define NUM_MSS (ARRAY_SIZE(msstab) - 1)
/*
* This (misnamed) value is the age of syncookie which is permitted.
@@ -134,9 +123,11 @@ __u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
tcp_synq_overflow(sk);
- for (mssind = 0; mss > msstab[mssind + 1]; mssind++)
- ;
- *mssp = msstab[mssind] + 1;
+ for (mssind = ARRAY_SIZE(msstab) - 1; mssind ; mssind--)
+ if (mss >= msstab[mssind])
+ break;
+
+ *mssp = msstab[mssind];
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
@@ -154,7 +145,7 @@ static inline int cookie_check(struct sk_buff *skb, __u32 cookie)
th->source, th->dest, seq,
jiffies / (HZ * 60), COUNTER_TRIES);
- return mssind < NUM_MSS ? msstab[mssind] + 1 : 0;
+ return mssind < ARRAY_SIZE(msstab) ? msstab[mssind] : 0;
}
struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
--
1.6.4.4
^ permalink raw reply related
* Re: [PATCH v2] netdev:bfin_mac: reclaim and free tx skb as soon as possible after transfer
From: Junchang Wang @ 2010-06-03 10:54 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1275556749.2456.24.camel@edumazet-laptop>
>
> As I said, core network takes care of three counters only, because it
> was 'free', as they share a cache line with a spinlock we must hold when
> calling xmit function.
>
> In receive path, we dont dirty a cache line in core network, so updating
> counters would add a cost. (modern NICs handle stats in firmware)
>
That's really resolve my doubts about lack of RX stats in kernel stack.
Thanks Eric.
--
--Junchang
^ permalink raw reply
* [Patch 1/2]r8169: remove rtl_rw_cpluscmd
From: Junchang Wang @ 2010-06-03 11:24 UTC (permalink / raw)
To: romieu; +Cc: davem, netdev
Some clean up work. Please correct me if any of this is wrong.
Writting "cmd" back without modification is redundant.
Secondly, because rtl_rw_cpluscmd is just encapsulation of
RTL_R16, remove rtl_rw_cpluscmd.
Signed-off-by: Junchang Wang <junchangwang@gmail.com>
---
drivers/net/r8169.c | 13 ++-----------
1 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..6a37813 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3404,15 +3404,6 @@ static void rtl_set_rx_tx_desc_registers(struct rtl8169_private *tp,
RTL_W32(RxDescAddrLow, ((u64) tp->RxPhyAddr) & DMA_BIT_MASK(32));
}
-static u16 rtl_rw_cpluscmd(void __iomem *ioaddr)
-{
- u16 cmd;
-
- cmd = RTL_R16(CPlusCmd);
- RTL_W16(CPlusCmd, cmd);
- return cmd;
-}
-
static void rtl_set_rx_max_size(void __iomem *ioaddr, unsigned int rx_buf_sz)
{
/* Low hurts. Let's disable the filtering. */
@@ -3471,7 +3462,7 @@ static void rtl_hw_start_8169(struct net_device *dev)
(tp->mac_version == RTL_GIGA_MAC_VER_04))
rtl_set_rx_tx_config_registers(tp);
- tp->cp_cmd |= rtl_rw_cpluscmd(ioaddr) | PCIMulRW;
+ tp->cp_cmd |= RTL_R16(CPlusCmd) | PCIMulRW;
if ((tp->mac_version == RTL_GIGA_MAC_VER_02) ||
(tp->mac_version == RTL_GIGA_MAC_VER_03)) {
@@ -3906,7 +3897,7 @@ static void rtl_hw_start_8101(struct net_device *dev)
rtl_set_rx_max_size(ioaddr, tp->rx_buf_sz);
- tp->cp_cmd |= rtl_rw_cpluscmd(ioaddr) | PCIMulRW;
+ tp->cp_cmd |= RTL_R16(CPlusCmd) | PCIMulRW;
RTL_W16(CPlusCmd, tp->cp_cmd);
--
^ permalink raw reply related
* [Patch 2/2]r8169: remove redundant RTL_W32
From: Junchang Wang @ 2010-06-03 11:27 UTC (permalink / raw)
To: romieu; +Cc: davem, netdev
Writting "cmd" into "CounterAddrLow" twice is redundant.
Signed-off-by: Junchang Wang <junchangwang@gmail.com>
---
drivers/net/r8169.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 217e709..e0a77a0 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1207,7 +1207,6 @@ static void rtl8169_update_counters(struct net_device *dev)
RTL_W32(CounterAddrHigh, (u64)paddr >> 32);
cmd = (u64)paddr & DMA_BIT_MASK(32);
- RTL_W32(CounterAddrLow, cmd);
RTL_W32(CounterAddrLow, cmd | CounterDump);
while (wait--) {
--
^ permalink raw reply related
* Re: [PATCH] phylib: Add support for the LXT973 phy.
From: Richard Cochran @ 2010-06-03 11:28 UTC (permalink / raw)
To: David Miller; +Cc: afleming, netdev
In-Reply-To: <20100602.081512.256851057.davem@davemloft.net>
On Wed, Jun 02, 2010 at 08:15:12AM -0700, David Miller wrote:
> From: Richard Cochran <richardcochran@gmail.com>
> Date: Wed, 2 Jun 2010 17:08:51 +0200
>
> > Well, is it okay to just use the first patch?
>
> Wasn't there another change you were asked to make that got included
> in the v2 patch?
No, I don't think so. He did ask whether the erratum has been fixed.
I will implement the 'port' field that he has suggested.
Richard
^ permalink raw reply
* Re: [Patch 2/2] mlx4: add dynamic LRO disable support
From: Ben Hutchings @ 2010-06-03 12:37 UTC (permalink / raw)
To: Amerigo Wang; +Cc: netdev, herbert.xu, nhorman, sgruszka, davem
In-Reply-To: <20100603034312.5305.61000.sendpatchset@localhost.localdomain>
On Wed, 2010-06-02 at 23:39 -0400, Amerigo Wang wrote:
> This patch adds dynamic LRO diable support for mlx4 net driver.
> It also fixes a bug of mlx4, which checks NETIF_F_LRO flag in rx
> path without rtnl lock.
[...]
Is that flag test actually unsafe - and if so, how is testing num_lro
any better? Perhaps access to net_device::features should be wrapped
with ACCESS_ONCE() to ensure that reads and writes are atomic.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [Patch 2/2]r8169: remove redundant RTL_W32
From: Ben Hutchings @ 2010-06-03 12:40 UTC (permalink / raw)
To: Junchang Wang; +Cc: romieu, davem, netdev
In-Reply-To: <20100603112721.GD24909@host-a-55.ustcsz.edu.cn>
On Thu, 2010-06-03 at 19:27 +0800, Junchang Wang wrote:
> Writting "cmd" into "CounterAddrLow" twice is redundant.
[...]
I assume you've checked this against the datasheets for all 20 variants
of this hardware?
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-06-03 12:43 UTC (permalink / raw)
To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100603080130.GA29709@gondor.apana.org.au>
On Thu, 2010-06-03 at 18:01 +1000, Herbert Xu wrote:
> On Sun, May 30, 2010 at 09:29:10AM -0400, jamal wrote:
> > The packet path is:
> > -->eth0-->tcpdump eth0-->pedit-->mirror to dummy0-->tcpdump dummy0
>
> Well this doesn't guarantee a cloned packet at all. Once af_packet
> receives the packet it'll wake up any listeners like tcpdump, if
> tcpdump gets to it before pedit runs then the packet won't be
> cloned anymore.
I may be misreading, but:
This is the point i have been trying to make, Herbert;-> There is no
_guarantee_ that the first tcpdump will see the packet that came out of
eth0 instead of seeing the packet that came out the pedit part of the
pipeline. I need to see the correct packet. I know with my check
this is guaranteed.
> Anyway, I don't see why actions are special. Everybody else lives
> by the rule that cloned skbs are not writeable.
Yes, if skb_cloned() is true but it is not as i said in my earlier
email.
> So if this was
> indeed buggy as you say it would have shown up a long time ago.
Things may have been buggy - I dont know; you just validated to me that
it _may_ happen. I will be more than happy to remove it if i can get a
guarantee.
So how do we fix this?
Does af_packet need to always clone? That way i can depend on it. I have
a feeling someone will be unhappy with that. I am avoiding to clone
every packet on my part because afaik this problem doesnt exist if i
dont use tcpdump/af_packet...
> Case in point, we had a bug in certain NIC drivers where they
> modified cloned skbs for TSO. This quickly showed up as bogus
> packets in tcpdump and we fixed it.
I think this is different.
cheers,
jamal
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-06-03 12:47 UTC (permalink / raw)
To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1275569030.3445.49.camel@bigi>
On Thu, Jun 03, 2010 at 08:43:50AM -0400, jamal wrote:
>
> On Thu, 2010-06-03 at 18:01 +1000, Herbert Xu wrote:
> > On Sun, May 30, 2010 at 09:29:10AM -0400, jamal wrote:
>
> > > The packet path is:
> > > -->eth0-->tcpdump eth0-->pedit-->mirror to dummy0-->tcpdump dummy0
> >
> > Well this doesn't guarantee a cloned packet at all. Once af_packet
> > receives the packet it'll wake up any listeners like tcpdump, if
> > tcpdump gets to it before pedit runs then the packet won't be
> > cloned anymore.
>
> I may be misreading, but:
> This is the point i have been trying to make, Herbert;-> There is no
> _guarantee_ that the first tcpdump will see the packet that came out of
> eth0 instead of seeing the packet that came out the pedit part of the
> pipeline. I need to see the correct packet. I know with my check
> this is guaranteed.
You *are* misreading what I wrote.
As I've repeatedly stated, it is guaranteed that either tcpdump
is finished with the skb, or you will see it as cloned.
Anything else is a serious bug in our stack, regardless of what
pedit does.
In fact pedit is buggy today because you're modifying a packet
that may be cloned.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-06-03 12:53 UTC (permalink / raw)
To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100603124717.GB315@gondor.apana.org.au>
On Thu, 2010-06-03 at 22:47 +1000, Herbert Xu wrote:
> You *are* misreading what I wrote.
>
> As I've repeatedly stated, it is guaranteed that either tcpdump
> is finished with the skb, or you will see it as cloned.
I dont see the problem of when tcpdump sees the packet first.
pedit doesnt care if it is cloned or not.
What i am worried about - and couldnt parse in your email - is:
What happens when pedit gets to it first? would skb_cloned() be true
always/guaranteed?
cheers,
jamal
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-06-03 12:56 UTC (permalink / raw)
To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1275569594.3445.51.camel@bigi>
On Thu, Jun 03, 2010 at 08:53:14AM -0400, jamal wrote:
>
> I dont see the problem of when tcpdump sees the packet first.
> pedit doesnt care if it is cloned or not.
> What i am worried about - and couldnt parse in your email - is:
> What happens when pedit gets to it first? would skb_cloned() be true
> always/guaranteed?
Yes that's what I've been saying all along. As long as the skb
is still held up in af_packet's socket queue, the packet will be
marked as cloned.
Only when tcpdump has done a recv(2) on the socket and liberated
the skb in the queue, will you see the clone flag turned off in
pedit.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-06-03 12:58 UTC (permalink / raw)
To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1275569594.3445.51.camel@bigi>
On Thu, Jun 03, 2010 at 08:53:14AM -0400, jamal wrote:
>
> I dont see the problem of when tcpdump sees the packet first.
> pedit doesnt care if it is cloned or not.
> What i am worried about - and couldnt parse in your email - is:
> What happens when pedit gets to it first? would skb_cloned() be true
> always/guaranteed?
Yes that's what I've been saying all along. As long as the skb
is still held in the af_packet socket queue, pedit will see the
skb as cloned. Only when tcpdump has done a recv(2) and liberated
the skb from the queue, will skb_cloned return false.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: Herbert Xu @ 2010-06-03 13:00 UTC (permalink / raw)
To: jamal; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1275569896.3445.53.camel@bigi>
On Thu, Jun 03, 2010 at 08:58:16AM -0400, jamal wrote:
>
> Never mind. You did answer this question above. You are saying it is
> guaranteed.
> I will send patches to remove the checks in both pedit and handle_ing().
> At some point i will test it....
Thanks Jamal!
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-06-03 13:01 UTC (permalink / raw)
To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <20100603130011.GA695@gondor.apana.org.au>
On Thu, 2010-06-03 at 23:00 +1000, Herbert Xu wrote:
> On Thu, Jun 03, 2010 at 08:58:16AM -0400, jamal wrote:
> >
> > Never mind. You did answer this question above. You are saying it is
> > guaranteed.
> > I will send patches to remove the checks in both pedit and handle_ing().
> > At some point i will test it....
>
> Thanks Jamal!
And thank you as well for you patience Herbert.
cheers,
jamal
^ permalink raw reply
* Re: Question about an assignment in handle_ing()
From: jamal @ 2010-06-03 12:58 UTC (permalink / raw)
To: Herbert Xu; +Cc: Jiri Pirko, netdev, davem, kaber
In-Reply-To: <1275569594.3445.51.camel@bigi>
On Thu, 2010-06-03 at 08:53 -0400, jamal wrote:
> On Thu, 2010-06-03 at 22:47 +1000, Herbert Xu wrote:
>
> > You *are* misreading what I wrote.
> >
> > As I've repeatedly stated, it is guaranteed that either tcpdump
> > is finished with the skb, or you will see it as cloned.
>
> I dont see the problem of when tcpdump sees the packet first.
> pedit doesnt care if it is cloned or not.
> What i am worried about - and couldnt parse in your email - is:
> What happens when pedit gets to it first? would skb_cloned() be true
> always/guaranteed?
Never mind. You did answer this question above. You are saying it is
guaranteed.
I will send patches to remove the checks in both pedit and handle_ing().
At some point i will test it....
cheers,
jamal
^ permalink raw reply
* Re: [PATCH v2] netfilter: Xtables: idletimer target implementation
From: Luciano Coelho @ 2010-06-03 13:17 UTC (permalink / raw)
To: ext Jan Engelhardt
Cc: netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
kaber@trash.net, Timo Teras
In-Reply-To: <1275559981.10855.141.camel@chilepepper>
On Thu, 2010-06-03 at 12:13 +0200, Coelho Luciano (Nokia-D/Helsinki)
wrote:
> On Thu, 2010-06-03 at 09:58 +0200, ext Jan Engelhardt wrote:
> > On Thursday 2010-06-03 09:04, Luciano Coelho wrote:
> > >
> > >Looking closer, it seems that it makes a bit of sense to add a kernel
> > >module to /sys/device/system. I think it makes more sense than adding
> > >to the module class or to the net class, actually. The idletimer is not
> > >a net device (so it doesn't fit in /sys/class/net) and it is not a
> > >module, even though it may be handled by the xt_IDLETIMER module.
> > >
> > >So we can look at the xt_idletimer as a system device, which is not a
> > >peripheral device in itself, but a software timer device (there are
> > >already similar components).
> > >
> > >I'll add the kernel object we need as a system class device, so it will
> > >go under /sys/devices/system/xt_idletimer. Does that make sense to you?
> >
> > Mh.. somehow I'd pick /sys/devices/virtual/xt_idletimer.
> > Or even create a /sys/net/xt_idletimer. (/sys has conceptual
> > subsystems directly beneath it: devices, fs, kernel, ...)
>
> Yes, I think I'll use the /sys/device/virtual/misc class. That seems to
> be the place where, well, miscellaneous devices go. :) I think it fits
> pretty nicely in that concept.
Argh, it seems that I'll never end this. Pros and cons of a few
different solutions:
1) /sys/devices/virtual/misc/xt_idletimer/<user_defined_label>
The misc class is a char device, so if I add the xt_idletimer there,
I'll get lots of useless things, like file operation functions etc. And
a few attributes that make sense to char devices are also added
automatically, but will never be used with the xt_idletimer.
2) /sys/devices/virtual/xt_idletimer/
This solution seems to be okay, basically I create a new virtual class
called xt_idletimer. This will automatically create a link
to /sys/class/xt_idletimer, so the user doesn't need to know that this
is a virtual device at all. One problem is that we'll have a few more
device attributes than we need such as ./power/wakeup and ./uevent,
which we won't use.
2a) /sys/devices/virtual/xt_idletimer/<user_defined_label>/timer
Each timer set by the user will be added as a new device named
<user_defined_label> under the xt_idletimer class. Each of these
devices will have one more attribute called timer (plus the
autogenerated wakeup and uevent attributes). The drawback is that we
use a more resources than we need, since we have more kobjects and more
attributes than we need.
2b) /sys/devices/virtual/xt_idletimer/timers/<user_defined_label>
This solution is similar to 2a, but uses less resources, since we have
only one kobject with several attributes (one for each user defined
label). We still have the extra attributes, though.
3) /sys/devices/system/xt_idletimer/<user_defined_label>/timer
This solution uses less resources, because the system device class
doesn't contain any implicit attributes.
4) /sys/devices/{system,virtual}/xt_idletimer/<user_defined_label>
I tried this one, by creating the xt_idletimer class and adding
attributes directly to it. But due to restrictions in sysfs, I didn't
figure out a way to dynamically add attributes to the class. Or, more
specifically, show()ing and notify()ing those dynamic attributes don't
seem to be possible for class attributes.
So, I think the best proposals are 2b and 3. I have a slight preference
towards 3, but 2b is also fine with me. What do you think?
--
Cheers,
Luca.
^ permalink raw reply
* Re: [Patch 1/2] s2io: add dynamic LRO disable support
From: Michal Schmidt @ 2010-06-03 13:38 UTC (permalink / raw)
To: netdev; +Cc: Amerigo Wang, herbert.xu, nhorman, sgruszka, davem,
Ramkrishna Vepa
In-Reply-To: <20100603034303.5305.55552.sendpatchset@localhost.localdomain>
[adding Ram Vepa to CC]
On Wed, 2 Jun 2010 23:38:59 -0400 Amerigo Wang wrote:
>
> This patch adds dynamic LRO diable support for s2io net driver.
>
> I don't have s2io card, so only did compiling test. Anyone who wants
> to test this is more than welcome.
>
> This is based on Neil's initial work.
>
> Signed-off-by: WANG Cong <amwang@redhat.com>
> Signed-off-by: Neil Horman <nhorman@redhat.com>
> Acked-by: Neil Horman <nhorman@redhat.com>
> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
>
>
> ---
> diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
> index 668327c..9db5df7 100644
> --- a/drivers/net/s2io.c
> +++ b/drivers/net/s2io.c
> @@ -6684,6 +6684,35 @@ static int s2io_ethtool_op_set_tso(struct
> net_device *dev, u32 data)
> return 0;
> }
> +static int s2io_ethtool_set_flags(struct net_device *dev, u32 data)
> +{
> + struct s2io_nic *sp = netdev_priv(dev);
> + int rc = 0;
> + int changed = 0;
> +
> + if (data & ETH_FLAG_LRO) {
> + if (lro_enable) {
> + if (!(dev->features & NETIF_F_LRO)) {
> + dev->features |= NETIF_F_LRO;
> + changed = 1;
> + }
> + } else
> + rc = -EINVAL;
> + } else if (dev->features & NETIF_F_LRO) {
> + dev->features &= ~NETIF_F_LRO;
> + changed = 1;
> + }
> +
> + if (changed && netif_running(dev)) {
> + s2io_stop_all_tx_queue(sp);
> + s2io_card_down(sp);
> + sp->lro = dev->features & NETIF_F_LRO;
> + rc = s2io_card_up(sp);
> + s2io_start_all_tx_queue(sp);
> + }
> +
> + return rc;
> +}
>
> static const struct ethtool_ops netdev_ethtool_ops = {
> .get_settings = s2io_ethtool_gset,
> @@ -6701,6 +6730,8 @@ static const struct ethtool_ops
> netdev_ethtool_ops = { .get_rx_csum = s2io_ethtool_get_rx_csum,
> .set_rx_csum = s2io_ethtool_set_rx_csum,
> .set_tx_csum = s2io_ethtool_op_set_tx_csum,
> + .set_flags = s2io_ethtool_set_flags,
> + .get_flags = ethtool_op_get_flags,
> .set_sg = ethtool_op_set_sg,
> .get_tso = s2io_ethtool_op_get_tso,
> .set_tso = s2io_ethtool_op_set_tso,
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox