* Re: [PATCH net-2.6] llc: fix a device refcount imbalance
From: Maxim Levitsky @ 2010-12-16 21:49 UTC (permalink / raw)
To: David Miller
Cc: eric.dumazet, linux1394-devel, stefanr, netdev, kuznet, jmorris,
kaber, opurdila, stable
In-Reply-To: <1291866395.7792.31.camel@maxim-laptop>
On Thu, 2010-12-09 at 05:46 +0200, Maxim Levitsky wrote:
> On Wed, 2010-12-08 at 09:59 -0800, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Sun, 05 Dec 2010 13:03:26 +0100
> >
> > > [PATCH net-2.6] llc: fix a device refcount imbalance
> > >
> > > commit abf9d537fea225 (llc: add support for SO_BINDTODEVICE) added one
> > > refcount imbalance in llc_ui_bind(), because dev_getbyhwaddr() doesnt
> > > take a reference on device, while dev_get_by_index() does.
> > >
> > > Fix this using RCU locking. And since an RCU conversion will be done for
> > > 2.6.38 for dev_getbyhwaddr(), put the rcu_read_lock/unlock exactly at
> > > their final place.
> > >
> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> >
> > Applied and queued up for -stable.
>
> Hi,
>
> Could I kindly ask if my patch is accepted?
>
> Best regards,
> Maxim Levitsky
>
Anybody?
Best regards,
Maxim Levitsky
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: force a fresh timestamp for ingress packets
From: Eric Dumazet @ 2010-12-16 21:30 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: David Miller, Changli Gao, netdev, Patrick McHardy
In-Reply-To: <20101216211744.GA2191@del.dom.local>
Le jeudi 16 décembre 2010 à 22:17 +0100, Jarek Poplawski a écrit :
> On Wed, Dec 15, 2010 at 04:50:52PM +0100, Eric Dumazet wrote:
> > In commit 8caf153974f2 (net: sch_netem: Fix an inconsistency in ingress
> > netem timestamps.), Jarek added a logic to refresh timestamps of
> > ingressed packets going through netem.
> >
> > I believe we should generalize this, forcing a refresh of timestamps in
> > dev_queue_xmit_nit() for all ingress packets, whatever qdisc/class they
> > used before being delivered.
> >
> > This way, we can have a good idea when packets are delivered to our
> > stack (tcpdump -i ifb0), while a tcpdump on original device gives
> > timestamps right before ingressing.
>
> I don't think we should do it. IMHO netem on ingress is a special case,
> obviously for testing, and otherwise the real (first) timestamp might
> be valuable for some users.
Well, I find difficult to check sfq is actually correctly working
because timestamps are mixed.
After this patch, I found the SFQ allot error for example.
I dont know, I feel adding a sysctl like netdev_tstamp_prequeue is not
worth it...
^ permalink raw reply
* [RFC] ipv6: don't flush routes when setting loopback down
From: Stephen Hemminger @ 2010-12-16 21:28 UTC (permalink / raw)
To: David Miller
Cc: ebiederm, brian.haley, netdev, maheshkelkar, lorenzo, yoshfuji,
stable
In-Reply-To: <20101209.122033.183046393.davem@davemloft.net>
When loopback device is being brought down, then keep the route table
entries because they are special. The entries in the local table for
linklocal routes and ::1 address should not be purged.
This is a sub optimal solution to the problem and should be replaced
by a better fix in future.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
patch versus current net-next tree, but if this acceptable
it should be applied to net-2.6 as well.
--- a/net/ipv6/addrconf.c 2010-12-16 10:29:34.035408392 -0800
+++ b/net/ipv6/addrconf.c 2010-12-16 10:30:37.366834482 -0800
@@ -2669,7 +2669,9 @@ static int addrconf_ifdown(struct net_de
ASSERT_RTNL();
- rt6_ifdown(net, dev);
+ /* Flush routes if device is being removed or it is not loopback */
+ if (how || !(dev->flags & IFF_LOOPBACK))
+ rt6_ifdown(net, dev);
neigh_ifdown(&nd_tbl, dev);
idev = __in6_dev_get(dev);
^ permalink raw reply
* Re: pktgen IP address stepping
From: Joe Perches @ 2010-12-16 21:22 UTC (permalink / raw)
To: Kristian Larsson; +Cc: David S. Miller, netdev
In-Reply-To: <20101216211153.GA8658@spritelink.se>
On Thu, 2010-12-16 at 22:11 +0100, Kristian Larsson wrote:
> On Thu, Dec 16, 2010 at 10:02:12AM -0800, Joe Perches wrote:
> > pr_<foo> calls are already prefixed with pktgen via pr_fmt,
> > you don't need to add it to the format string.
> I can see at least two different cases of debug statements;
> if (debug)
> pr_info("Delay set at: %llu ns\n", pkt_dev->delay);
> if (debug)
> printk(KERN_DEBUG "pktgen: dst_min set to: %s\n",
> pkt_dev->dst_min);
> what's the reasoning behind using one or the other?
pr_info(fmt, ...) emits at KERN_INFO level and uses pr_fmt(fmt)
printk(KERN_DEBUG emits at KERN_DEBUG level and doesn't use pr_fmt
pr_debug(fmt, ...) does use pr_fmt and is optionally compiled.
I can't say why the author decided to use a info level output
for the delay set message. Maybe that person doesn't like to see
KERN_DEBUG messages.
cheers, Joe
^ permalink raw reply
* Re: [RFC PATCH] net: Implement read-only protection and COW'ing of metrics.
From: David Miller @ 2010-12-16 21:21 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <20101216.115900.183061857.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Thu, 16 Dec 2010 11:59:00 -0800 (PST)
> Hmm... perhaps we need to force-COW or revert to the default zero
> metrics for any routing cache entries with reference counts?
>
> Or maybe that's not even needed.
>
> Because nobody should try to touch metrics without first doing a
> dst->check(), especially after the RCU grace period, so it should be
> OK no?
Ok I did some audits and there are some problems in this area.
First of all I have to at least defer the kmalloc()'d metrics free
until the RCU callback.
Second of all things like tcp_update_metrics() use __sk_dst_get()
instead of __sk_dst_check().
I have an idea to use another pointer state bit to indicate that the
metrics are "dead". This would block all COW operations and writes.
Metric reads for obsolete dst's would be redirected to the read-only
all-zeros default array.
In this way we won't need to do anything different in places like
tcp_update_metrics().
I'll post a new patch once I sort all of this out.
Thanks for catching this!
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: force a fresh timestamp for ingress packets
From: Jarek Poplawski @ 2010-12-16 21:17 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Changli Gao, netdev, Patrick McHardy
In-Reply-To: <1292428252.3427.342.camel@edumazet-laptop>
On Wed, Dec 15, 2010 at 04:50:52PM +0100, Eric Dumazet wrote:
> In commit 8caf153974f2 (net: sch_netem: Fix an inconsistency in ingress
> netem timestamps.), Jarek added a logic to refresh timestamps of
> ingressed packets going through netem.
>
> I believe we should generalize this, forcing a refresh of timestamps in
> dev_queue_xmit_nit() for all ingress packets, whatever qdisc/class they
> used before being delivered.
>
> This way, we can have a good idea when packets are delivered to our
> stack (tcpdump -i ifb0), while a tcpdump on original device gives
> timestamps right before ingressing.
I don't think we should do it. IMHO netem on ingress is a special case,
obviously for testing, and otherwise the real (first) timestamp might
be valuable for some users.
Jarek P.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/core/dev.c | 2 +-
> net/sched/sch_netem.c | 8 --------
> 2 files changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d28b3a0..a2846f8 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1506,7 +1506,7 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
> struct packet_type *ptype;
>
> #ifdef CONFIG_NET_CLS_ACT
> - if (!(skb->tstamp.tv64 && (G_TC_FROM(skb->tc_verd) & AT_INGRESS)))
> + if (!skb->tstamp.tv64 || (G_TC_FROM(skb->tc_verd) & AT_INGRESS))
> net_timestamp_set(skb);
> #else
> net_timestamp_set(skb);
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index e5593c0..1c29cc0 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -281,14 +281,6 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
> if (unlikely(!skb))
> return NULL;
>
> -#ifdef CONFIG_NET_CLS_ACT
> - /*
> - * If it's at ingress let's pretend the delay is
> - * from the network (tstamp will be updated).
> - */
> - if (G_TC_FROM(skb->tc_verd) & AT_INGRESS)
> - skb->tstamp.tv64 = 0;
> -#endif
> pr_debug("netem_dequeue: return skb=%p\n", skb);
> sch->q.qlen--;
> return skb;
>
>
^ permalink raw reply
* Re: pktgen IP address stepping
From: Kristian Larsson @ 2010-12-16 21:11 UTC (permalink / raw)
To: Joe Perches; +Cc: David S. Miller, netdev
In-Reply-To: <1292522532.29894.33.camel@Joe-Laptop>
On Thu, Dec 16, 2010 at 10:02:12AM -0800, Joe Perches wrote:
> pr_<foo> calls are already prefixed with pktgen via pr_fmt,
> you don't need to add it to the format string.
I can see at least two different cases of debug statements;
if (debug)
pr_info("Delay set at: %llu ns\n", pkt_dev->delay);
if (debug)
printk(KERN_DEBUG "pktgen: dst_min set to: %s\n",
pkt_dev->dst_min);
what's the reasoning behind using one or the other?
-K
--
Kristian Larsson KLL-RIPE
+46 704 264511 kll@spritelink.net
^ permalink raw reply
* Re: [PATCH V7 3/8] posix clocks: introduce dynamic clocks
From: Thomas Gleixner @ 2010-12-16 20:56 UTC (permalink / raw)
To: Richard Cochran
Cc: linux-kernel, linux-api, netdev, Alan Cox, Arnd Bergmann,
Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
Peter Zijlstra, Rodolfo Giometti
In-Reply-To: <8debe4d48d9a6484b7fbd35d8888524155fed977.1292512461.git.richard.cochran@omicron.at>
On Thu, 16 Dec 2010, Richard Cochran wrote:
> --- /dev/null
> +++ b/include/linux/posix-clock.h
> +/**
> + * struct posix_clock_fops - character device operations
> + *
> + * Every posix clock is represented by a character device. Drivers may
> + * optionally offer extended capabilities by implementing these
> + * functions. The character device file operations are first handled
> + * by the clock device layer, then passed on to the driver by calling
> + * these functions.
> + *
> + * The clock device layer already uses fp->private_data, so drivers
> + * are provided their private data via the 'priv' paramenter.
> + */
> +void *posix_clock_private(struct file *fp);
Leftover ? There is neither a caller nor an implementation
> +struct posix_clock_fops {
> + int (*fasync) (void *priv, int fd, struct file *file, int on);
> + int (*mmap) (void *priv, struct vm_area_struct *vma);
> + int (*open) (void *priv, fmode_t f_mode);
> + int (*release) (void *priv);
> + long (*ioctl) (void *priv, unsigned int cmd, unsigned long arg);
> + long (*compat_ioctl) (void *priv, unsigned int cmd, unsigned long arg);
Do we really need a compat_ioctl ?
> + ssize_t (*read) (void *priv, uint flags, char __user *buf, size_t cnt);
> + unsigned int (*poll) (void *priv, struct file *file, poll_table *wait);
> --- a/kernel/time/Makefile
> +++ b/kernel/time/Makefile
> @@ -1,4 +1,5 @@
> -obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o timecompare.o timeconv.o
> +obj-y += timekeeping.o ntp.o clocksource.o jiffies.o timer_list.o \
> +timecompare.o timeconv.o posix-clock.o
Please start a new obj-y += line instead
> --- /dev/null
> +++ b/kernel/time/posix-clock.c
> +
> +#define MAX_CLKDEV BITS_PER_LONG
Please put some more space between the MAX_CLKDEV and BITS_PER_LONG. I
had to look three times to not read it as a single word.
> +static DECLARE_BITMAP(clocks_map, MAX_CLKDEV);
> +static DEFINE_MUTEX(clocks_mutex); /* protects 'clocks_map' */
Please avoid tail comments
> +struct posix_clock {
> + struct posix_clock_operations *ops;
> + struct cdev cdev;
> + struct kref kref;
> + struct mutex mux;
> + void *priv;
You can get away with that private pointer and all the void *
arguments to the various posix_clock_operations, if you mandate that
the posix_clock_operations are embedded into a clock specific data
structure.
So void *priv would become struct posix_clock_operations *clkops and
you can get your private data in the clock implementation with
container_of().
> + int index;
> + bool zombie;
Ths field is only set, but nowhere else used. What's the purpose ?
Leftover ?
> +};
> +
> +static void delete_clock(struct kref *kref);
> +
> +
> +static int posix_clock_open(struct inode *inode, struct file *fp)
> +{
> + struct posix_clock *clk =
> + container_of(inode->i_cdev, struct posix_clock, cdev);
> +
> + kref_get(&clk->kref);
> + fp->private_data = clk;
fp->private_data should only be set on success. And this will leak a
ref count when the clock open function fails.
What's that kref protecting here ?
> +
> + if (clk->ops->fops.open)
> + return clk->ops->fops.open(clk->priv, fp->f_mode);
> + else
> + return 0;
> +}
> +
> +static int posix_clock_release(struct inode *inode, struct file *fp)
> +{
> + struct posix_clock *clk = fp->private_data;
> + int err = 0;
fp->private_data should be set to NULL in the release function.
> + if (clk->ops->fops.release)
> + err = clk->ops->fops.release(clk->priv);
> +
> + kref_put(&clk->kref, delete_clock);
> +struct posix_clock *posix_clock_create(struct posix_clock_operations *cops,
> + dev_t devid, void *priv)
> +{
> + struct posix_clock *clk;
> + int err;
> +
> + mutex_lock(&clocks_mutex);
> +
> + err = -ENOMEM;
> + clk = kzalloc(sizeof(*clk), GFP_KERNEL);
> + if (!clk)
> + goto no_memory;
> +
> + err = -EBUSY;
> + clk->index = find_first_zero_bit(clocks_map, MAX_CLKDEV);
> + if (clk->index < MAX_CLKDEV)
> + set_bit(clk->index, clocks_map);
> + else
> + goto no_index;
if (clk->index >= MAX_CLKDEV)
goto no_index;
set_bit(clk->index, clocks_map);
Makes it better readable.
> +static void delete_clock(struct kref *kref)
> +{
> + struct posix_clock *clk =
> + container_of(kref, struct posix_clock, kref);
> +
> + mutex_lock(&clocks_mutex);
> + clear_bit(clk->index, clocks_map);
> + mutex_unlock(&clocks_mutex);
> +
> + mutex_destroy(&clk->mux);
> + kfree(clk);
> +}
> +
> +void posix_clock_destroy(struct posix_clock *clk)
> +{
> + cdev_del(&clk->cdev);
> +
> + mutex_lock(&clk->mux);
> + clk->zombie = true;
> + mutex_unlock(&clk->mux);
> +
> + kref_put(&clk->kref, delete_clock);
I still have some headache to understand that kref / delete_clock
magic here.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH net-next-2.6 0/3] bonding fixes
From: David Miller @ 2010-12-16 20:42 UTC (permalink / raw)
To: bhutchings; +Cc: fubar, netdev, linux-net-drivers
In-Reply-To: <1292264283.9860.12.camel@bwh-desktop>
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 13 Dec 2010 18:18:03 +0000
> These should of course go to net-2.6, not net-next-2.6.
I'll apply these, thanks a lot Ben.
^ permalink raw reply
* Re: [PATCH v3 net-next-2.6] netfilter: x_tables: dont block BH while reading counters
From: Eric Dumazet @ 2010-12-16 20:40 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jesper Dangaard Brouer, Patrick McHardy, Arnaldo Carvalho de Melo,
Steven Rostedt, Alexander Duyck, netfilter-devel, netdev,
Peter P Waskiewicz Jr
In-Reply-To: <20101216121238.53deb370@nehalam>
Le jeudi 16 décembre 2010 à 12:12 -0800, Stephen Hemminger a écrit :
> You changed from:
> get local cpu counters
> then sum other cpus
> to:
> sum all cpu's.
>
> This is fine, and will give the same answer.
>
Yes, I did that because the fetch of individual counters is now more
complex.
I tried to make code smaller and less complex. Eventually we are now
allowed to be preempted and even migrated to another cpu during the
process.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH kernel 2.6.37-rc5] axnet_cs: move id (0x1bf, 0x2328) to axnet_cs
From: David Miller @ 2010-12-16 20:39 UTC (permalink / raw)
To: ken_kawasaki; +Cc: netdev
In-Reply-To: <20101213212724.353e84f1.ken_kawasaki@spring.nifty.jp>
From: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Date: Mon, 13 Dec 2010 21:27:24 +0900
>
> axnet_cs:
> Accton EN2328 or compatible (id: 0x01bf, 0x2328) uses Asix chip.
> So it works better with axnet_cs instead of pcnet_cs.
>
> Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Applied, thank you.
^ permalink raw reply
* Re: pktgen IP address stepping
From: Kristian Larsson @ 2010-12-16 20:34 UTC (permalink / raw)
To: Joe Perches; +Cc: David S. Miller, netdev
In-Reply-To: <1292527132.29894.52.camel@Joe-Laptop>
On Thu, Dec 16, 2010 at 11:18:52AM -0800, Joe Perches wrote:
> On Thu, 2010-12-16 at 19:30 +0100, Kristian Larsson wrote:
> > > I suggest using actual ipv4 addresses (and ipv6 if you ever
> > > need it) as increment/mask.
> > Not sure I follow you, should the stepping be specified as
> > 0.0.0.213 if I would like to increase with 213 or 0.0.255.255 if
> > I want to increase by a /16 at a time?
>
> More like:
>
> Initial IPv4 address: 0.0.0.1
> Step: 0.1.0.0
> Mask: 127.0.0.1
>
> addr = (addr + step) & mask;
>
> Cycles through all /16's below 128
Which I guess is doable with the patch using these options;
(dst|src)_min 0.0.0.1
(dst|src)_max 128.0.0.0
(dst|src)_step 65535
but you want (dst|src)_step to be specified using dot-quad
instead of an integer?
Kristian.
--
Kristian Larsson KLL-RIPE
+46 704 264511 kll@spritelink.net
^ permalink raw reply
* Re: [PATCH] net: use NUMA_NO_NODE instead of the magic number -1
From: David Miller @ 2010-12-16 20:36 UTC (permalink / raw)
To: xiaosuo; +Cc: eric.dumazet, therbert, netdev
In-Reply-To: <1292332155-25151-1-git-send-email-xiaosuo@gmail.com>
From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 14 Dec 2010 21:09:15 +0800
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH 3/3] bonding: add the debugfs interface to see RLB hash table
From: David Miller @ 2010-12-16 20:35 UTC (permalink / raw)
To: izumi.taku; +Cc: netdev, fubar, eric.dumazet, shemminger
In-Reply-To: <4D05A96B.7020902@jp.fujitsu.com>
From: Taku Izumi <izumi.taku@jp.fujitsu.com>
Date: Mon, 13 Dec 2010 14:04:43 +0900
>
> This patch provices the debugfs interface to see RLB hash table
> like the following:
>
> # cat /sys/kernel/debug/bonding/bond0/rlb_hash_table
> SourceIP DestinationIP Destination MAC DEV
> 10.124.196.205 10.124.196.205 ff:ff:ff:ff:ff:ff eth4
> 10.124.196.205 10.124.196.81 00:19:99:XX:XX:XX eth3
> 10.124.196.205 10.124.196.1 00:21:d8:XX:XX:XX eth0
>
> This is helpful to check if the receive load balancing works as expected.
>
> Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Applied.
^ permalink raw reply
* Re: [PATCH 2/3] bonding: migrate some macros from bond_alb.c to bond_alb.h
From: David Miller @ 2010-12-16 20:35 UTC (permalink / raw)
To: izumi.taku; +Cc: netdev, fubar, eric.dumazet, shemminger
In-Reply-To: <4D05A91C.9060104@jp.fujitsu.com>
From: Taku Izumi <izumi.taku@jp.fujitsu.com>
Date: Mon, 13 Dec 2010 14:03:24 +0900
>
> This patch simply migrates some macros from bond_alb.c to bond_alb.h.
>
>
> Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: delete expired route in ip6_pmtu_deliver
From: David Miller @ 2010-12-16 20:28 UTC (permalink / raw)
To: avagin; +Cc: netdev, linux-kernel
In-Reply-To: <1292116811-22216-1-git-send-email-avagin@openvz.org>
From: Andrey Vagin <avagin@openvz.org>
Date: Sun, 12 Dec 2010 04:20:11 +0300
> The first big packets sent to a "low-MTU" client correctly
> triggers the creation of a temporary route containing the reduced MTU.
>
> But after the temporary route has expired, new ICMP6 "packet too big"
> will be sent, rt6_pmtu_discovery will find the previous EXPIRED route
> check that its mtu isn't bigger then in icmp packet and do nothing
> before the temporary route will not deleted by gc.
>
> I make the simple experiment:
> while :; do
> time ( dd if=/dev/zero bs=10K count=1 | ssh hostname dd of=/dev/null ) || break;
> done
>
> The "time" reports real 0m0.197s if a temporary route isn't expired, but
> it reports real 0m52.837s (!!!!) immediately after a temporare route has
> expired.
>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH] fix bug in Ethernet Channel Bonding Driver
From: David Miller @ 2010-12-16 20:24 UTC (permalink / raw)
To: dhillf; +Cc: netdev, linux-kernel
In-Reply-To: <AANLkTiksuf0ZDWyjH6ec9rSRoG+5ZdakA-Tha9MXLvy+@mail.gmail.com>
From: Hillf Danton <dhillf@gmail.com>
Date: Sat, 11 Dec 2010 12:54:11 +0800
> The returned slave is incorrect, if the net device under check is not
> charged yet by the master.
>
> Signed-off-by: Hillf Danton <dhillf@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: IPV6 loopback bound socket succeeds connecting to remote host
From: David Miller @ 2010-12-16 20:18 UTC (permalink / raw)
To: shanwei; +Cc: albertpretorius, netdev, yoshfuji, pekkas, jmorris
In-Reply-To: <4CF75BC3.1020606@cn.fujitsu.com>
From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Thu, 02 Dec 2010 16:41:39 +0800
> Albert Pretorius wrote, at 12/02/2010 03:45 PM:
>> Hi
>>
>> --- On Wed, 1/12/10, Shan Wei <shanwei@cn.fujitsu.com> wrote:
>>> I think it make nonsense to translate data between loopback
>>> device and other e.g. eth0 device in same machine.
>>
>> I agree, RFC4291 makes it clear for IPV6 that no interface should accept traffic from loopback, I should not have tried to make it behave like IPV4.
>> I can not find an equivalent statement for IPV4 though, all I could find is this from RFC3330:
>>
>> 127.0.0.0/8 - This block is assigned for use as the Internet host
>> loopback address. A datagram sent by a higher level protocol to an
>> address anywhere within this block should loop back inside the host.
>> This is ordinarily implemented using only 127.0.0.1/32 for loopback,
>> but no addresses within this block should ever appear on any network
>> anywhere [RFC1700, page 5].
>>
>> Do you perhaps know?
>
> There are no same statement for IPv4 loopback address. I have checked RFC1122, RFC1700 and RFC5753 .
Shan, you really need to handle this in the ipv6 routing code.
Your approach will only modify socket based route handling, it will
not handle the ipv6 forwarding case which as per the quoted RFC
sections must be handled too.
^ permalink raw reply
* Re: [PATCH v3 net-next-2.6] netfilter: x_tables: dont block BH while reading counters
From: Stephen Hemminger @ 2010-12-16 20:12 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jesper Dangaard Brouer, Patrick McHardy, Arnaldo Carvalho de Melo,
Steven Rostedt, Alexander Duyck, netfilter-devel, netdev,
Peter P Waskiewicz Jr
In-Reply-To: <1292529519.2655.4.camel@edumazet-laptop>
On Thu, 16 Dec 2010 20:58:39 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 16 décembre 2010 à 09:57 -0800, Stephen Hemminger a écrit :
> > On Thu, 16 Dec 2010 18:53:06 +0100
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > > @@ -759,7 +742,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
> > > * about).
> > > */
> > > countersize = sizeof(struct xt_counters) * private->number;
> > > - counters = vmalloc(countersize);
> > > + counters = vzalloc(countersize);
> > >
> > > if (counters == NULL)
> > > return ERR_PTR(-ENOMEM);
> > > @@ -1007,7 +990,7 @@ static int __do_replace(struct net *net, const char *name,
> > > struct arpt_entry *iter;
> > >
> > > ret = 0;
> > > - counters = vmalloc(num_counters * sizeof(struct xt_counters));
> > > + counters = vzalloc(num_counters * sizeof(struct xt_counters));
> > > if (!counters) {
> > > ret = -ENOMEM;
> > > goto out;
> >
> > This seems like a different and unrelated change.
> >
>
> Since you later Acked the patch, you probably know that we now provide
> to get_counters() a zeroed area, since we do a sum for each possible
> cpu, I am answering anyway for other readers ;)
>
> Thanks for reviewing !
You changed from:
get local cpu counters
then sum other cpus
to:
sum all cpu's.
This is fine, and will give the same answer.
--
^ permalink raw reply
* Re: [PATCH 00/12 net-next] cxgb4 update v2
From: David Miller @ 2010-12-16 20:10 UTC (permalink / raw)
To: dm; +Cc: netdev
In-Reply-To: <1292398615-26527-1-git-send-email-dm@chelsio.com>
From: Dimitris Michailidis <dm@chelsio.com>
Date: Tue, 14 Dec 2010 23:36:43 -0800
>
> This is v2 of the cxgb4 series. I've removed the fallback call to kcalloc in
> patch 12 per Eric's comment and restored an accidentally removed increment in
> patch 7.
All applied.
I made one change, I made sure that for Joe Perches's patch he will be
set as the author in the tree.
Next time you have a patch written by someone else, start the commit
message with:
From: NAME <EMAIL>
and GIT will take care of the rest.
Thanks.
^ permalink raw reply
* Re: [PATCH] e1000e: workaround missing power down mii control bit on 82571
From: Arthur Jones @ 2010-12-16 20:04 UTC (permalink / raw)
To: Allan, Bruce W; +Cc: Ben Hutchings, Kirsher, Jeffrey T, netdev@vger.kernel.org
In-Reply-To: <8DD2590731AB5D4C9DBF71A877482A9001773F76C4@orsmsx509.amr.corp.intel.com>
Hi Bruce, ...
On Thu, Dec 16, 2010 at 11:28:18AM -0800, Allan, Bruce W wrote:
> >-----Original Message-----
> >From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> >Behalf Of Allan, Bruce W
> >Sent: Thursday, December 16, 2010 11:04 AM
> >To: Ben Hutchings; Arthur Jones
> >Cc: Kirsher, Jeffrey T; netdev@vger.kernel.org
> >Subject: RE: [PATCH] e1000e: workaround missing power down mii control bit on
> >82571
> >
> >>-----Original Message-----
> >>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> >>Behalf Of Ben Hutchings
> >>Sent: Thursday, December 16, 2010 10:57 AM
> >>To: Arthur Jones
> >>Cc: Kirsher, Jeffrey T; netdev@vger.kernel.org
> >>Subject: Re: [PATCH] e1000e: workaround missing power down mii control bit on
> >>82571
> >>
> >>Adding this special case into MDIO access seems like a really nasty
> >>hack. Surely the callers that set the control register should take care
> >>of this.
> >>
> >>Ben.
> >
> >Agreed. I am setting up to repro now to see if it is an actual hardware
> >issue or just a software bug; either way, this patch is not the correct
> >approach and I'll follow up shortly.
> >
> >Bruce.
>
> It's the reset in e1000_set_settings() which ignores that we had previously
> powered off the Phy. I'll go through the rest of the code and fix up this
> and any other occurrences of similar issues properly.
Thanks for having a look!
We do a read-modify-write there of
the PHY control register. We take
the rest of the bits as being good,
but, for some reason we don't get the
power down bit (always reads back
zero). Is this a known 82571 issue?
On 82574, e.g., we seem to get the
power down bit back when we read...
Are you sure you want to spread that
82571 specific logic all over the driver?
Arthur
>
> Thanks for reporting this issue Arthur.
>
> Bruce.
^ permalink raw reply
* Re: [PATCH v3 net-next-2.6] netfilter: x_tables: dont block BH while reading counters
From: Eric Dumazet @ 2010-12-16 19:58 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jesper Dangaard Brouer, Patrick McHardy, Arnaldo Carvalho de Melo,
Steven Rostedt, Alexander Duyck, netfilter-devel, netdev,
Peter P Waskiewicz Jr
In-Reply-To: <20101216095709.18f16c53@nehalam>
Le jeudi 16 décembre 2010 à 09:57 -0800, Stephen Hemminger a écrit :
> On Thu, 16 Dec 2010 18:53:06 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> > @@ -759,7 +742,7 @@ static struct xt_counters *alloc_counters(const struct xt_table *table)
> > * about).
> > */
> > countersize = sizeof(struct xt_counters) * private->number;
> > - counters = vmalloc(countersize);
> > + counters = vzalloc(countersize);
> >
> > if (counters == NULL)
> > return ERR_PTR(-ENOMEM);
> > @@ -1007,7 +990,7 @@ static int __do_replace(struct net *net, const char *name,
> > struct arpt_entry *iter;
> >
> > ret = 0;
> > - counters = vmalloc(num_counters * sizeof(struct xt_counters));
> > + counters = vzalloc(num_counters * sizeof(struct xt_counters));
> > if (!counters) {
> > ret = -ENOMEM;
> > goto out;
>
> This seems like a different and unrelated change.
>
Since you later Acked the patch, you probably know that we now provide
to get_counters() a zeroed area, since we do a sum for each possible
cpu, I am answering anyway for other readers ;)
Thanks for reviewing !
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC PATCH] net: Implement read-only protection and COW'ing of metrics.
From: David Miller @ 2010-12-16 19:59 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1292529359.2655.2.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 16 Dec 2010 20:55:59 +0100
> What prevents fi->fib_metrics to disappear, if fib is destroyed, since
> we dont take a reference ?
Routing cache is flushed.
Hmm... perhaps we need to force-COW or revert to the default zero
metrics for any routing cache entries with reference counts?
Or maybe that's not even needed.
Because nobody should try to touch metrics without first doing a
dst->check(), especially after the RCU grace period, so it should be
OK no?
^ permalink raw reply
* Re: [RFC PATCH] net: Implement read-only protection and COW'ing of metrics.
From: Eric Dumazet @ 2010-12-16 19:55 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20101215.132113.189700977.davem@davemloft.net>
Le mercredi 15 décembre 2010 à 13:21 -0800, David Miller a écrit :
> Routing metrics are now copy-on-write.
>
> Initially a route entry points it's metrics at a read-only location.
> If a routing table entry exists, it will point there. Else it will
> point at the all zero metric place- holder 'dst_default_metrics'.
>
> The writeability state of the metrics is stored in the low bits of the
> metrics pointer, we have two bits left to spare if we want to store
> more states.
>
> For the initial implementation, COW is implemented simply via kmalloc.
> However future enhancements will change this to place the writable
> metrics somewhere else, in order to increase sharing. Very likely
> this "somewhere else" will be the inetpeer cache.
>
> Note also that this means that metrics updates may transiently fail
> if we cannot COW the metrics successfully.
>
> But even by itself, this patch should decrease memory usage and
> increase cache locality especially for routing workloads. In those
> cases the read-only metric copies stay in place and never get written
> to.
>
> TCP workloads where metrics get updated, and those rare cases where
> PMTU triggers occur, will take a very slight performance hit. But
> that hit will be alleviated when the long-term writable metrics
> move to a more sharable location.
>
> Since the metrics storage went from a u32 array of RTAX_MAX entries to
> what is essentially a pointer, some retooling of the dst_entry layout
> was necessary.
>
> Most importantly, we need to preserve the alignment of the reference
> count so that it doesn't share cache lines with the read-mostly state,
> as per Eric Dumazet's alignment assertion checks.
>
> The only non-trivial bit here is the move of the 'flags' member into
> the writeable cacheline. This is OK since we are always accessing the
> flags around the same moment when we made a modification to the
> reference count.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
Hi David
>
>
> @@ -1840,7 +1843,7 @@ static void rt_set_nexthop(struct rtable *rt, struct fib_result *res, u32 itag)
> if (FIB_RES_GW(*res) &&
> FIB_RES_NH(*res).nh_scope == RT_SCOPE_LINK)
> rt->rt_gateway = FIB_RES_GW(*res);
> - dst_import_metrics(dst, fi->fib_metrics);
> + dst_attach_metrics(dst, fi->fib_metrics, true);
I am a bit concerned about this part.
What prevents fi->fib_metrics to disappear, if fib is destroyed, since
we dont take a reference ?
^ permalink raw reply
* Re: [PATCH net-next 0/9] bnx2x: FCoE support
From: David Miller @ 2010-12-16 19:43 UTC (permalink / raw)
To: vladz; +Cc: netdev, eilong, dmitry, shmulikr
In-Reply-To: <1292255013.631.48.camel@lb-tlvb-vladz>
From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Mon, 13 Dec 2010 17:43:33 +0200
> This patch-series adds FCoE support to 57712 HW. It provides
> required intefaces for upper-level driver (CNIC) and handles
> FCoE related HW/FW initialization and flows.
>
> Since the FW files are too big (patches 6 and 8) and will not
> pass the mailing list, it is also located at:
> http://linux.broadcom.com/eilong/1.62.00-2
>
> It's a second attempt to submit this patch series after fixing
> a Tx hash distribution fairness.
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox