* Re: Network performance with small packets
From: Shirley Ma @ 2011-01-27 20:15 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Steve Dobbelstein, kvm, netdev
In-Reply-To: <20110127200548.GE5228@redhat.com>
On Thu, 2011-01-27 at 22:05 +0200, Michael S. Tsirkin wrote:
> Interesting. Could this is be a variant of the now famuous bufferbloat
> then?
>
> I guess we could drop some packets if we see we are not keeping up.
> For
> example if we see that the ring is > X% full, we could quickly
> complete
> Y% without transmitting packets on. Or maybe we should drop some bytes
> not packets.
It's worth to try to figure out what's the best approach. I will make a
patch.
> >
> > Requesting guest notification and extra interrupts is what we want
> to
> > avoid to reduce VM exits for saving CPUs. I don't think it's good.
>
> Yes but how do you explain regression?
> One simple theory is that guest net stack became faster
> and so the host can't keep up.
Yes, that's what I think here. Some qdisc code has been changed
recently.
> >
> > By polling the vq a bit more aggressively, you meant vhost, right?
> >
> > Shirley
>
> Yes.
I had a similar patch before, I can modify it and test it out.
Shirley
^ permalink raw reply
* Re: Network performance with small packets
From: Michael S. Tsirkin @ 2011-01-27 20:05 UTC (permalink / raw)
To: Shirley Ma; +Cc: Steve Dobbelstein, kvm, netdev
In-Reply-To: <1296157547.1640.45.camel@localhost.localdomain>
On Thu, Jan 27, 2011 at 11:45:47AM -0800, Shirley Ma wrote:
> On Thu, 2011-01-27 at 21:31 +0200, Michael S. Tsirkin wrote:
> > Well slowing down the guest does not sound hard - for example we can
> > request guest notifications, or send extra interrupts :)
> > A slightly more sophisticated thing to try is to
> > poll the vq a bit more aggressively.
> > For example if we handled some requests and now tx vq is empty,
> > reschedule and yeild. Worth a try?
>
> I used dropping packets in high level to slow down TX.
> I am still
> thinking what's the right the approach here.
Interesting. Could this is be a variant of the now famuous bufferbloat then?
I guess we could drop some packets if we see we are not keeping up. For
example if we see that the ring is > X% full, we could quickly complete
Y% without transmitting packets on. Or maybe we should drop some bytes
not packets.
>
> Requesting guest notification and extra interrupts is what we want to
> avoid to reduce VM exits for saving CPUs. I don't think it's good.
Yes but how do you explain regression?
One simple theory is that guest net stack became faster
and so the host can't keep up.
>
> By polling the vq a bit more aggressively, you meant vhost, right?
>
> Shirley
Yes.
^ permalink raw reply
* [PATCH net-next-2.6] sungem: Use net_device's internal stats
From: Denis Kirjanov @ 2011-01-27 19:54 UTC (permalink / raw)
To: davem; +Cc: netdev
Use net_device_stats instance from the struct net_device.
Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
---
drivers/net/sungem.c | 58 +++++++++++++++++++++++++-------------------------
drivers/net/sungem.h | 1 -
2 files changed, 29 insertions(+), 30 deletions(-)
diff --git a/drivers/net/sungem.c b/drivers/net/sungem.c
index 1c5408f..c1a3448 100644
--- a/drivers/net/sungem.c
+++ b/drivers/net/sungem.c
@@ -320,28 +320,28 @@ static int gem_txmac_interrupt(struct net_device *dev, struct gem *gp, u32 gem_s
if (txmac_stat & MAC_TXSTAT_URUN) {
netdev_err(dev, "TX MAC xmit underrun\n");
- gp->net_stats.tx_fifo_errors++;
+ dev->stats.tx_fifo_errors++;
}
if (txmac_stat & MAC_TXSTAT_MPE) {
netdev_err(dev, "TX MAC max packet size error\n");
- gp->net_stats.tx_errors++;
+ dev->stats.tx_errors++;
}
/* The rest are all cases of one of the 16-bit TX
* counters expiring.
*/
if (txmac_stat & MAC_TXSTAT_NCE)
- gp->net_stats.collisions += 0x10000;
+ dev->stats.collisions += 0x10000;
if (txmac_stat & MAC_TXSTAT_ECE) {
- gp->net_stats.tx_aborted_errors += 0x10000;
- gp->net_stats.collisions += 0x10000;
+ dev->stats.tx_aborted_errors += 0x10000;
+ dev->stats.collisions += 0x10000;
}
if (txmac_stat & MAC_TXSTAT_LCE) {
- gp->net_stats.tx_aborted_errors += 0x10000;
- gp->net_stats.collisions += 0x10000;
+ dev->stats.tx_aborted_errors += 0x10000;
+ dev->stats.collisions += 0x10000;
}
/* We do not keep track of MAC_TXSTAT_FCE and
@@ -469,20 +469,20 @@ static int gem_rxmac_interrupt(struct net_device *dev, struct gem *gp, u32 gem_s
u32 smac = readl(gp->regs + MAC_SMACHINE);
netdev_err(dev, "RX MAC fifo overflow smac[%08x]\n", smac);
- gp->net_stats.rx_over_errors++;
- gp->net_stats.rx_fifo_errors++;
+ dev->stats.rx_over_errors++;
+ dev->stats.rx_fifo_errors++;
ret = gem_rxmac_reset(gp);
}
if (rxmac_stat & MAC_RXSTAT_ACE)
- gp->net_stats.rx_frame_errors += 0x10000;
+ dev->stats.rx_frame_errors += 0x10000;
if (rxmac_stat & MAC_RXSTAT_CCE)
- gp->net_stats.rx_crc_errors += 0x10000;
+ dev->stats.rx_crc_errors += 0x10000;
if (rxmac_stat & MAC_RXSTAT_LCE)
- gp->net_stats.rx_length_errors += 0x10000;
+ dev->stats.rx_length_errors += 0x10000;
/* We do not track MAC_RXSTAT_FCE and MAC_RXSTAT_VCE
* events.
@@ -594,7 +594,7 @@ static int gem_abnormal_irq(struct net_device *dev, struct gem *gp, u32 gem_stat
if (netif_msg_rx_err(gp))
printk(KERN_DEBUG "%s: no buffer for rx frame\n",
gp->dev->name);
- gp->net_stats.rx_dropped++;
+ dev->stats.rx_dropped++;
}
if (gem_status & GREG_STAT_RXTAGERR) {
@@ -602,7 +602,7 @@ static int gem_abnormal_irq(struct net_device *dev, struct gem *gp, u32 gem_stat
if (netif_msg_rx_err(gp))
printk(KERN_DEBUG "%s: corrupt rx tag framing\n",
gp->dev->name);
- gp->net_stats.rx_errors++;
+ dev->stats.rx_errors++;
goto do_reset;
}
@@ -684,7 +684,7 @@ static __inline__ void gem_tx(struct net_device *dev, struct gem *gp, u32 gem_st
break;
}
gp->tx_skbs[entry] = NULL;
- gp->net_stats.tx_bytes += skb->len;
+ dev->stats.tx_bytes += skb->len;
for (frag = 0; frag <= skb_shinfo(skb)->nr_frags; frag++) {
txd = &gp->init_block->txd[entry];
@@ -696,7 +696,7 @@ static __inline__ void gem_tx(struct net_device *dev, struct gem *gp, u32 gem_st
entry = NEXT_TX(entry);
}
- gp->net_stats.tx_packets++;
+ dev->stats.tx_packets++;
dev_kfree_skb_irq(skb);
}
gp->tx_old = entry;
@@ -738,6 +738,7 @@ static __inline__ void gem_post_rxds(struct gem *gp, int limit)
static int gem_rx(struct gem *gp, int work_to_do)
{
+ struct net_device *dev = gp->dev;
int entry, drops, work_done = 0;
u32 done;
__sum16 csum;
@@ -782,15 +783,15 @@ static int gem_rx(struct gem *gp, int work_to_do)
len = (status & RXDCTRL_BUFSZ) >> 16;
if ((len < ETH_ZLEN) || (status & RXDCTRL_BAD)) {
- gp->net_stats.rx_errors++;
+ dev->stats.rx_errors++;
if (len < ETH_ZLEN)
- gp->net_stats.rx_length_errors++;
+ dev->stats.rx_length_errors++;
if (len & RXDCTRL_BAD)
- gp->net_stats.rx_crc_errors++;
+ dev->stats.rx_crc_errors++;
/* We'll just return it to GEM. */
drop_it:
- gp->net_stats.rx_dropped++;
+ dev->stats.rx_dropped++;
goto next;
}
@@ -843,8 +844,8 @@ static int gem_rx(struct gem *gp, int work_to_do)
netif_receive_skb(skb);
- gp->net_stats.rx_packets++;
- gp->net_stats.rx_bytes += len;
+ dev->stats.rx_packets++;
+ dev->stats.rx_bytes += len;
next:
entry = NEXT_RX(entry);
@@ -2472,7 +2473,6 @@ static int gem_resume(struct pci_dev *pdev)
static struct net_device_stats *gem_get_stats(struct net_device *dev)
{
struct gem *gp = netdev_priv(dev);
- struct net_device_stats *stats = &gp->net_stats;
spin_lock_irq(&gp->lock);
spin_lock(&gp->tx_lock);
@@ -2481,17 +2481,17 @@ static struct net_device_stats *gem_get_stats(struct net_device *dev)
* so we shield against this
*/
if (gp->running) {
- stats->rx_crc_errors += readl(gp->regs + MAC_FCSERR);
+ dev->stats.rx_crc_errors += readl(gp->regs + MAC_FCSERR);
writel(0, gp->regs + MAC_FCSERR);
- stats->rx_frame_errors += readl(gp->regs + MAC_AERR);
+ dev->stats.rx_frame_errors += readl(gp->regs + MAC_AERR);
writel(0, gp->regs + MAC_AERR);
- stats->rx_length_errors += readl(gp->regs + MAC_LERR);
+ dev->stats.rx_length_errors += readl(gp->regs + MAC_LERR);
writel(0, gp->regs + MAC_LERR);
- stats->tx_aborted_errors += readl(gp->regs + MAC_ECOLL);
- stats->collisions +=
+ dev->stats.tx_aborted_errors += readl(gp->regs + MAC_ECOLL);
+ dev->stats.collisions +=
(readl(gp->regs + MAC_ECOLL) +
readl(gp->regs + MAC_LCOLL));
writel(0, gp->regs + MAC_ECOLL);
@@ -2501,7 +2501,7 @@ static struct net_device_stats *gem_get_stats(struct net_device *dev)
spin_unlock(&gp->tx_lock);
spin_unlock_irq(&gp->lock);
- return &gp->net_stats;
+ return &dev->stats;
}
static int gem_set_mac_address(struct net_device *dev, void *addr)
diff --git a/drivers/net/sungem.h b/drivers/net/sungem.h
index 1990546..ede0178 100644
--- a/drivers/net/sungem.h
+++ b/drivers/net/sungem.h
@@ -994,7 +994,6 @@ struct gem {
u32 status;
struct napi_struct napi;
- struct net_device_stats net_stats;
int tx_fifo_sz;
int rx_fifo_sz;
--
1.7.3.4
^ permalink raw reply related
* Re: [PATCH RESEND 0/4] Fix kconfig breakage wrt to CONFIGFS_FS
From: Nicholas A. Bellinger @ 2011-01-27 19:52 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, linux-kernel, linux-fsdevel, linux-netdev,
Joel Becker, Randy Dunlap, Stephen Rothwell, Americo Wang,
David Miller, Andrew Morton
In-Reply-To: <20110127194157.GA7565@elte.hu>
On Thu, 2011-01-27 at 20:41 +0100, Ingo Molnar wrote:
> * Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
>
> > From: Nicholas Bellinger <nab@linux-iscsi.org>
> >
> > Hi Linus,
> >
> > The following four patches are to address recent Kconfig CONFIGFS_FS ->
> > 'select SYSFS' change for GFS2_FS, and 'depends && SYSFS && CONFIGFS_FS'
> > breakage for NETCONSOLE_DYNAMIC, DLM, and OCFS2_FS in .38-rc2.
> >
> > Please review and consider pulling from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-linus-v2
>
> I had the original for-linus branch tested rather extensively, and that one had no
> problems. What's different in -v2?
>
Hi Ingo,
The extra code is in patch #1 that fixes an additional GFS2_FS warning
reported by Randy that appeared after the initial CONFIGFS_FS -> 'select
SYSFS' conversion.
This patch for GFS2_FS was in scsi-post-merge-2.6.git/for-linus, but was
not included in the shortlog/stat to Linus. So considering my current
(poor) record with Kconfig changes in mainline, I figured that sending
them out to the list one more time for review would make the most
sense. ;)
Thanks,
--nab
^ permalink raw reply
* Re: [PATCH RESEND 0/4] Fix kconfig breakage wrt to CONFIGFS_FS
From: Ingo Molnar @ 2011-01-27 19:41 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: Linus Torvalds, linux-kernel, linux-fsdevel, linux-netdev,
Joel Becker, Randy Dunlap, Stephen Rothwell, Americo Wang,
David Miller, Andrew Morton
In-Reply-To: <1296155430-3796-1-git-send-email-nab@linux-iscsi.org>
* Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> From: Nicholas Bellinger <nab@linux-iscsi.org>
>
> Hi Linus,
>
> The following four patches are to address recent Kconfig CONFIGFS_FS ->
> 'select SYSFS' change for GFS2_FS, and 'depends && SYSFS && CONFIGFS_FS'
> breakage for NETCONSOLE_DYNAMIC, DLM, and OCFS2_FS in .38-rc2.
>
> Please review and consider pulling from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-linus-v2
I had the original for-linus branch tested rather extensively, and that one had no
problems. What's different in -v2?
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: fix dev_seq_next()
From: Paul E. McKenney @ 2011-01-27 19:33 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1296101282.1783.54.camel@edumazet-laptop>
On Thu, Jan 27, 2011 at 05:08:02AM +0100, Eric Dumazet wrote:
> Paul, the following comment in include/linux/rculist.h is misleading :
>
> "Why is there no list_empty_rcu()? Because list_empty() serves this
> purpose..."
>
> This is probably why I made the error ;)
>
> list_empty() has a meaning only if state cannot change right after its
> use.
>
> In an rcu_read_lock() section, state _can_ change, so there is no way
> list_empty() can be used at all.
My apologies for my messup!!!
So, there are two things that I need to fix:
1. There needs to be a list_empty_rcu() which contains an
rcu_access_pointer() in order to keep sparse happy.
2. The comment at the beginning of include/linux/rculist.h
needs to warn that the return value from this new
list_empty_rcu() API can become instantly obsolete,
so the caller must either hold the update-side lock
or be prepared to deal with obsolete values.
Or am I missing something?
Thanx, Paul
> Thanks
>
> [PATCH net-next-2.6] net: fix dev_seq_next()
>
> Commit c6d14c84566d (net: Introduce for_each_netdev_rcu() iterator)
> added a race in dev_seq_next().
>
> The rcu_dereference() call should be done _before_ testing the end of
> list, or we might return a wrong net_device if a concurrent thread
> changes net_device list under us.
>
> Note : discovered thanks to a sparse warning :
>
> net/core/dev.c:3919:9: error: incompatible types in comparison expression
> (different address spaces)
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
> Given this was discovered by code analysis rather than a bug report, I
> prepared a patch for net-next-2.6. Once fully tested, this could be
> backported to 2.6.33
>
> include/linux/netdevice.h | 9 ++++++++-
> net/core/dev.c | 11 +++++++----
> 2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8858422..c7d7074 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1447,7 +1447,7 @@ static inline struct net_device *next_net_device_rcu(struct net_device *dev)
> struct net *net;
>
> net = dev_net(dev);
> - lh = rcu_dereference(dev->dev_list.next);
> + lh = rcu_dereference(list_next_rcu(&dev->dev_list));
> return lh == &net->dev_base_head ? NULL : net_device_entry(lh);
> }
>
> @@ -1457,6 +1457,13 @@ static inline struct net_device *first_net_device(struct net *net)
> net_device_entry(net->dev_base_head.next);
> }
>
> +static inline struct net_device *first_net_device_rcu(struct net *net)
> +{
> + struct list_head *lh = rcu_dereference(list_next_rcu(&net->dev_base_head));
> +
> + return lh == &net->dev_base_head ? NULL : net_device_entry(lh);
> +}
> +
> extern int netdev_boot_setup_check(struct net_device *dev);
> extern unsigned long netdev_boot_base(const char *prefix, int unit);
> extern struct net_device *dev_getbyhwaddr_rcu(struct net *net, unsigned short type,
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1b4c07f..ddd5df2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4051,12 +4051,15 @@ void *dev_seq_start(struct seq_file *seq, loff_t *pos)
>
> void *dev_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> {
> - struct net_device *dev = (v == SEQ_START_TOKEN) ?
> - first_net_device(seq_file_net(seq)) :
> - next_net_device((struct net_device *)v);
> + struct net_device *dev = v;
> +
> + if (v == SEQ_START_TOKEN)
> + dev = first_net_device_rcu(seq_file_net(seq));
> + else
> + dev = next_net_device_rcu(dev);
>
> ++*pos;
> - return rcu_dereference(dev);
> + return dev;
> }
>
> void dev_seq_stop(struct seq_file *seq, void *v)
>
>
^ permalink raw reply
* Re: [PATCH] r8169: use RxFIFO overflow workaround for 8168c chipset
From: Ivan Vecera @ 2011-01-27 19:23 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, Hayes
In-Reply-To: <20110127143219.GA7831@electric-eye.fr.zoreil.com>
On Thu, 2011-01-27 at 15:32 +0100, Francois Romieu wrote:
> Ivan Vecera <ivecera@redhat.com> :
> > I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
> > generating RxFIFO overflow errors. The result is an infinite loop in
> > interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
>
> Acked-by: as your patch ties it to a specific 8168 revision (CFG_METHOD_6
> in Realtek's parlance).
>
> Surprizing as it may seem, unconditionaly enabling it has not always
> produced the expected result. See 53f57357ff0afc37804f4e82ee3123e0c0a2cad6
> for instance. Realtek's r1868 driver ignores it most of time as well.
>
> Was it normal high-load or pktgen like high load ?
The test case was: Migration of the several kvm guests at the same time
between two hosts.
Ivan
^ permalink raw reply
* [PATCH 4/4] ocfs2: Make OCFS2_FS use select CONFIGFS_FS
From: Nicholas A. Bellinger @ 2011-01-27 19:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, linux-fsdevel, linux-netdev, Ingo Molnar,
Joel Becker, Randy Dunlap, Stephen Rothwell, Americo Wang,
David Miller, Andrew Morton, Nicholas Bellinger
In-Reply-To: <1296155430-3796-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@linux-iscsi.org>
Convert 'depends && SYSFS && CONFIGFS_FS' to 'select CONFIGFS_FS'
Reported-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
fs/ocfs2/Kconfig | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/fs/ocfs2/Kconfig b/fs/ocfs2/Kconfig
index 77a8de5..bb03131 100644
--- a/fs/ocfs2/Kconfig
+++ b/fs/ocfs2/Kconfig
@@ -1,6 +1,7 @@
config OCFS2_FS
tristate "OCFS2 file system support"
- depends on NET && SYSFS && CONFIGFS_FS
+ depends on NET
+ select CONFIGFS_FS
select JBD2
select CRC32
select QUOTA
--
1.7.3.5
^ permalink raw reply related
* [PATCH 3/4] dlm: Make DLM use select CONFIGFS_FS
From: Nicholas A. Bellinger @ 2011-01-27 19:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, linux-fsdevel, linux-netdev, Ingo Molnar,
Joel Becker, Randy Dunlap, Stephen Rothwell, Americo Wang,
David Miller, Andrew Morton, Nicholas Bellinger
In-Reply-To: <1296155430-3796-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@linux-iscsi.org>
Convert 'depends && SYSFS && CONFIGFS_FS' to 'select CONFIGFS_FS'
Reported-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
fs/dlm/Kconfig | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/Kconfig b/fs/dlm/Kconfig
index 1897eb1..4f65a50 100644
--- a/fs/dlm/Kconfig
+++ b/fs/dlm/Kconfig
@@ -1,7 +1,7 @@
menuconfig DLM
tristate "Distributed Lock Manager (DLM)"
- depends on EXPERIMENTAL && INET
- depends on SYSFS && CONFIGFS_FS && (IPV6 || IPV6=n)
+ depends on EXPERIMENTAL && INET && (IPV6 || IPV6=n)
+ select CONFIGFS_FS
select IP_SCTP
help
A general purpose distributed lock manager for kernel or userspace
--
1.7.3.5
^ permalink raw reply related
* [PATCH 2/4] net: Make NETCONSOLE_DYNAMIC use select CONFIGFS_FS
From: Nicholas A. Bellinger @ 2011-01-27 19:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, linux-fsdevel, linux-netdev, Ingo Molnar,
Joel Becker, Randy Dunlap, Stephen Rothwell, Americo Wang,
David Miller, Andrew Morton, Nicholas Bellinger
In-Reply-To: <1296155430-3796-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@linux-iscsi.org>
Convert 'depends && SYSFS && CONFIGFS_FS' to 'select CONFIGFS_FS'
Reported-by: Joel Becker <jlbec@evilplan.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
drivers/net/Kconfig | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0382332..3d23ebb 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -3389,7 +3389,8 @@ config NETCONSOLE
config NETCONSOLE_DYNAMIC
bool "Dynamic reconfiguration of logging targets"
- depends on NETCONSOLE && SYSFS && CONFIGFS_FS
+ depends on NETCONSOLE
+ select CONFIGFS_FS
help
This option enables the ability to dynamically reconfigure target
parameters (interface, IP addresses, port numbers, MAC addresses)
--
1.7.3.5
^ permalink raw reply related
* [PATCH 1/4] gfs2: Remove 'select SYSFS ...' from Kconfig
From: Nicholas A. Bellinger @ 2011-01-27 19:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, linux-fsdevel, linux-netdev, Ingo Molnar,
Joel Becker, Randy Dunlap, Stephen Rothwell, Americo Wang,
David Miller, Andrew Morton, Nicholas Bellinger
In-Reply-To: <1296155430-3796-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@linux-iscsi.org>
With CONFIGFS_FS now doing 'select SYSFS' by default, the extra
'select SYSFS if GFS2_FS_LOCKING_DLM' for GFS2_FS is now unnecessary.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
---
fs/gfs2/Kconfig | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index c465ae0..ff0a8eb 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -3,7 +3,6 @@ config GFS2_FS
depends on (64BIT || LBDAF)
select DLM if GFS2_FS_LOCKING_DLM
select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
- select SYSFS if GFS2_FS_LOCKING_DLM
select IP_SCTP if DLM_SCTP
select FS_POSIX_ACL
select CRC32
--
1.7.3.5
^ permalink raw reply related
* [PATCH RESEND 0/4] Fix kconfig breakage wrt to CONFIGFS_FS
From: Nicholas A. Bellinger @ 2011-01-27 19:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, linux-fsdevel, linux-netdev, Ingo Molnar,
Joel Becker, Randy Dunlap, Stephen Rothwell, Americo Wang,
David Miller, Andrew Morton, Nicholas Bellinger
From: Nicholas Bellinger <nab@linux-iscsi.org>
Hi Linus,
The following four patches are to address recent Kconfig CONFIGFS_FS ->
'select SYSFS' change for GFS2_FS, and 'depends && SYSFS && CONFIGFS_FS'
breakage for NETCONSOLE_DYNAMIC, DLM, and OCFS2_FS in .38-rc2.
Please review and consider pulling from:
git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-linus-v2
currently against the latest linux-2.6.git/master HEAD:
commit 6fb1b304255efc5c4c93874ac8c066272e257e28
Merge: ac751ef 409550f
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed Jan 26 16:31:44 2011 +1000
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Thanks,
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Nicholas Bellinger (4):
gfs2: Remove 'select SYSFS ...' from Kconfig
net: Make NETCONSOLE_DYNAMIC use select CONFIGFS_FS
dlm: Make DLM use select CONFIGFS_FS
ocfs2: Make OCFS2_FS use select CONFIGFS_FS
drivers/net/Kconfig | 3 ++-
fs/dlm/Kconfig | 4 ++--
fs/gfs2/Kconfig | 1 -
fs/ocfs2/Kconfig | 3 ++-
4 files changed, 6 insertions(+), 5 deletions(-)
--
1.7.3.5
^ permalink raw reply
* Re: [PATCH V10 01/15] time: Introduce timekeeping_inject_offset
From: John Stultz @ 2011-01-27 18:48 UTC (permalink / raw)
To: Richard Cochran, Richard Cochran
Cc: linux-kernel, linux-api, netdev, Alan Cox, Arnd Bergmann,
Christoph Lameter, David Miller, Krzysztof Halasa, Peter Zijlstra,
Rodolfo Giometti, Thomas Gleixner, Benjamin Herrenschmidt,
H. Peter Anvin, Ingo Molnar, Mike Frysinger, Paul Mackerras,
Russell King
In-Reply-To: <6aec014551fc1d34924d6a7bcf97769867c15ba9.1296124770.git.richard.cochran@omicron.at>
On Thu, 2011-01-27 at 11:54 +0100, John Stultz wrote:
> This adds a kernel-internal timekeeping interface to add or subtract
> a fixed amount from CLOCK_REALTIME. This makes it so kernel users or
> interfaces trying to do so do not have to read the time, then add an
> offset and then call settimeofday(), which adds some extra error in
> comparision to just simply adding the offset in the kernel timekeeping
> core.
>
> CC: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> ---
Hey Richard,
Something seems wrong with your mail sending script. It looks like your
sending the email under my name (John Stultz
<richardcochran@gmail.com>).
While I appreciate you preserving the path author, and the signoffs are
right, you really should send the email under your own name.
The proper style is to keep the mail-header From: the same (ie: Richard
Cochran <richardcochran@gmail.com>), but as the first line of the mail
body put:
From: Author Name <author@lemail.com>
thanks
-john
^ permalink raw reply
* Re: [PATCH net-next-2.6] drivers/net: remove some rcu sparse warnings
From: Michael Chan @ 2011-01-27 18:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Arnd Bergmann, Eilon Greenstein
In-Reply-To: <1296106103.1783.114.camel@edumazet-laptop>
On Wed, 2011-01-26 at 21:28 -0800, Eric Dumazet wrote:
> Add missing __rcu annotations and helpers.
> minor : Fix some rcu_dereference() calls in macvtap
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Arnd Bergmann <arnd@arndb.de>
> CC: Michael Chan <mchan@broadcom.com>
> CC: Eilon Greenstein <eilong@broadcom.com>
Thanks Eric. bnx2/bnx2x/cnic portions look good.
Acked-by: Michael Chan <mchan@broadcom.com>
^ permalink raw reply
* Re: skb_split in tcp_retransmit_skb question
From: Sergey Senozhatsky @ 2011-01-27 18:33 UTC (permalink / raw)
To: David S. Miller
Cc: Alexey Kuznetsov, Eric Dumazet, Pekka Savola (ipv6), netdev,
linux-kernel
In-Reply-To: <20110127152057.GA4153@swordfish.minsk.epam.com>
[-- Attachment #1: Type: text/plain, Size: 675 bytes --]
On (01/27/11 17:20), Sergey Senozhatsky wrote:
> Hello,
>
> Suppose we have the following scenario:
>
> tcp_write_timer ->
> tcp_retransmit_skb
>
> in tcp_retransmit_skb we have `if (skb->len > cur_mss)' evaluted to true, which leads
> to tcp_fragment(sk, skb, cur_mss, cur_mss) call. tcp_fragment calls skb_split(skb, buff, len)
> which, in turn, calls skb_split_no_header(skb, skb1, len, pos), where we have
> `skb_shinfo(skb)->nr_frags++' while in `for (i = 0; i < nfrags; i++)' loop.
Sorry for the noise. Alexey has pointed out that we have
skb_shinfo(skb)->nr_frags = 0 in skb_split_no_header. Have no idea how did I miss it.
Thanks,
Sergey
[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Julia Lawall @ 2011-01-27 18:18 UTC (permalink / raw)
To: Ben Pfaff; +Cc: hadi, netdev
In-Reply-To: <87pqrikzd6.fsf@benpfaff.org>
On Thu, 27 Jan 2011, Ben Pfaff wrote:
> Julia Lawall <julia@diku.dk> writes:
>
> > I find numerous occurrences of code like the following, in which nest ends
> > up with the value NULL and then nla_nest_cancel is called with nest as the
> > second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> > second argument, and nlmsg_trim does nothing if its second argument is
> > NULL. Is there any reason to keep these calls?
>
> I think that you are missing that NLA_PUT() contains an internal
> "goto nla_put_failure;". If that branch is taken, then
> nla_nest_cancel() trims off the nested attribute. So just
> removing the call to nla_nest_cancel() would change behavior in
> that case.
Indeed. Thank you for the explanation.
julia
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Ben Pfaff @ 2011-01-27 17:52 UTC (permalink / raw)
To: Julia Lawall; +Cc: hadi, netdev
In-Reply-To: <Pine.LNX.4.64.1101271804530.13796@pc-004.diku.dk>
Julia Lawall <julia@diku.dk> writes:
> I find numerous occurrences of code like the following, in which nest ends
> up with the value NULL and then nla_nest_cancel is called with nest as the
> second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> second argument, and nlmsg_trim does nothing if its second argument is
> NULL. Is there any reason to keep these calls?
I think that you are missing that NLA_PUT() contains an internal
"goto nla_put_failure;". If that branch is taken, then
nla_nest_cancel() trims off the nested attribute. So just
removing the call to nla_nest_cancel() would change behavior in
that case.
--
Ben Pfaff
http://benpfaff.org
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Julia Lawall @ 2011-01-27 17:25 UTC (permalink / raw)
To: Kurt Van Dijck; +Cc: hadi, netdev
In-Reply-To: <20110127172118.GA331@e-circ.dyndns.org>
On Thu, 27 Jan 2011, Kurt Van Dijck wrote:
> On Thu, Jan 27, 2011 at 06:08:34PM +0100, Julia Lawall wrote:
> >
> > I find numerous occurrences of code like the following, in which nest ends
> > up with the value NULL and then nla_nest_cancel is called with nest as the
> > second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> > second argument, and nlmsg_trim does nothing if its second argument is
> > NULL. Is there any reason to keep these calls?
> I just learned this:
> nla_nest_start() adds data to the skb.
> nla_nest_end() 'commits' the proper length.
> nla_nest_cancel() reverts skb to the state before nla_nest_start(),
> as if nothing happened.
Yes, I can see this as well. But in this case, it seems to me taht
nothing has happened, because nla_nest_star has returned NULL?
julia
^ permalink raw reply
* Re: question about nla_nest_cancel
From: Kurt Van Dijck @ 2011-01-27 17:21 UTC (permalink / raw)
To: Julia Lawall; +Cc: hadi, netdev
In-Reply-To: <Pine.LNX.4.64.1101271804530.13796@pc-004.diku.dk>
On Thu, Jan 27, 2011 at 06:08:34PM +0100, Julia Lawall wrote:
>
> I find numerous occurrences of code like the following, in which nest ends
> up with the value NULL and then nla_nest_cancel is called with nest as the
> second argument. But nla_nest_cancel just calls nlmsg_trim with the same
> second argument, and nlmsg_trim does nothing if its second argument is
> NULL. Is there any reason to keep these calls?
I just learned this:
nla_nest_start() adds data to the skb.
nla_nest_end() 'commits' the proper length.
nla_nest_cancel() reverts skb to the state before nla_nest_start(),
as if nothing happened.
Kurt
^ permalink raw reply
* question about nla_nest_cancel
From: Julia Lawall @ 2011-01-27 17:08 UTC (permalink / raw)
To: hadi, netdev
I find numerous occurrences of code like the following, in which nest ends
up with the value NULL and then nla_nest_cancel is called with nest as the
second argument. But nla_nest_cancel just calls nlmsg_trim with the same
second argument, and nlmsg_trim does nothing if its second argument is
NULL. Is there any reason to keep these calls?
thanks,
julia
static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb)
{
struct tbf_sched_data *q = qdisc_priv(sch);
struct nlattr *nest;
struct tc_tbf_qopt opt;
nest = nla_nest_start(skb, TCA_OPTIONS);
if (nest == NULL)
goto nla_put_failure;
opt.limit = q->limit;
opt.rate = q->R_tab->rate;
if (q->P_tab)
opt.peakrate = q->P_tab->rate;
else
memset(&opt.peakrate, 0, sizeof(opt.peakrate));
opt.mtu = q->mtu;
opt.buffer = q->buffer;
NLA_PUT(skb, TCA_TBF_PARMS, sizeof(opt), &opt);
nla_nest_end(skb, nest);
return skb->len;
nla_put_failure:
nla_nest_cancel(skb, nest);
return -1;
}
^ permalink raw reply
* Re: Realtek r8168C / r8169 driver VLAN TAG stripping
From: Francois Romieu @ 2011-01-27 16:50 UTC (permalink / raw)
To: Anand Raj Manickam; +Cc: netdev, Hayes
In-Reply-To: <AANLkTi=OwsMO8x9AOy=MmohU4SSQcv+o=TvwNs0NNsQR@mail.gmail.com>
Anand Raj Manickam <anandrm@gmail.com> :
> On Thu, Jan 27, 2011 at 8:37 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> > Anand Raj Manickam <anandrm@gmail.com> :
[...]
> > - ip addr show
>
> 3: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
> link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
> inet 172.16.1.1/16 brd 172.16.255.255 scope global eth0
> inet6 fe80::217:54ff:fe00:f662/64 scope link
> valid_lft forever preferred_lft forever
>
> 8: eth0.50@eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc noqueue
> link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
> inet 172.16.10.10/24 brd 172.16.10.255 scope global eth0.50
> inet6 fe80::217:54ff:fe00:f662/64 scope link
> valid_lft forever preferred_lft forever
Could you try again after issuing :
ip addr del 172.16.1.1/16 brd 172.16.255.255 dev eth0
then send the unabbreviated "ip addr show" and "ip route show all" if
things do not perform better.
(no iptables / ip rules wizardry, right ?)
[...]
> > - ethtool -k eth0
>
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: off
> scatter-gather: off
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
Ok.
[...]
> > I do not get the "VLAN tag gets stripped" concept, especially on Tx.
> > Does it mean "no packet" or "a packet whose content is wrong" ?
>
> Sorry for not being clear ;-)
>
> When we transmit a packet with VLAN TAG , the TAG get stripped when
> transmitted through the device , the other end trunk port / sniffer
> does NOT see a TAG.
> Similarly , when a VLAN Tagged packet is sent from the other end , The
> TAG gets stripped by the device , We DONOT see the tag .
But the data flows in both directions, right ?
> I use tcpdump -i eth0 -n -nn -e vlan 50
> to see if the packets are gettin tagged or NOT .
>
> The same config works on forcedeth
What do you call "same config" ?
I am mildly convinced that your config is simple enough to isolate a
driver level vlan problem.
--
Ueimor
^ permalink raw reply
* Re: SO_REUSEPORT - can it be done in kernel?
From: Bill Sommerfeld @ 2011-01-27 15:55 UTC (permalink / raw)
To: Daniel Baluta; +Cc: therbert, netdev
In-Reply-To: <AANLkTimDtaV=WhZUUEivg3_vEUeUk3_WQSs09h7USiUj@mail.gmail.com>
On Thu, Jan 27, 2011 at 02:07, Daniel Baluta <daniel.baluta@gmail.com> wrote:
> How did you solved the issue regarding scaling TCP listeners?
> I think SO_REUSEPORT proposed by patch [1] can be a good
> start. Where there any follow ups?
Google is using the patch internally. I've recently joined google and
have picked up this work from Tom; I'm starting to rework how it
interacts with TCP (in particular, changing how it interacts with
request sockets and listen sockets so that incoming connections are
not prematurely bound to a specific listener sharing the port). I
have nothing worth sharing yet.
^ permalink raw reply
* Re: netconsole build breakage (Re: [GIT] Networking)
From: Ingo Molnar @ 2011-01-27 15:51 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: Américo Wang, David Miller, James Bottomley, Randy Dunlap,
torvalds, akpm, netdev, linux-kernel, Joel Becker
In-Reply-To: <1295433231.21351.17.camel@haakon2.linux-iscsi.org>
* Nicholas A. Bellinger <nab@linux-iscsi.org> wrote:
> On Wed, 2011-01-19 at 18:08 +0800, Américo Wang wrote:
> > On Wed, Jan 19, 2011 at 10:59:20AM +0100, Ingo Molnar wrote:
> > >
> > >FYI, there's a .38-rc1 build failure that triggers rather often:
> > >
> > > drivers/built-in.o: In function `drop_netconsole_target':
> > > netconsole.c:(.text+0x130146): undefined reference to `config_item_put'
> > > drivers/built-in.o: In function `write_msg':
> > > netconsole.c:(.text+0x1301aa): undefined reference to `config_item_get'
> > > netconsole.c:(.text+0x130217): undefined reference to `config_item_put'
> > > drivers/built-in.o: In function `netconsole_netdev_event':
> > > netconsole.c:(.text+0x1302ab): undefined reference to `config_item_get'
> > > ...
> > >
> > >Triggered by this configuration:
> > >
> > > CONFIG_CONFIGFS_FS=m
> > > CONFIG_NETCONSOLE=y
> > >
> >
> > Should be "depends on CONFIGFS_FS=y".
>
> Sorry for breaking this one folks..
>
> Where this was left yesterday was to change NETCONSOLE_DYNAMIC, DLM and
> OCFS2_FS symbols to use 'select configfs' instead of 'depends on SYSFS
> && CONFIGFS':
>
> http://marc.info/?l=linux-kernel&m=129539400709508&w=2
>
> but unfortuately this did not make it into .38-rc1 in time..
>
> Using 'select CONFIGFS_FS' here for NETCONSOLE_DYNAMIC with the
> following patches should do the trick.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/nab/scsi-post-merge-2.6.git for-linus
>
> Thanks,
>
> Nicholas Bellinger (3):
> net: Make NETCONSOLE_DYNAMIC use select CONFIGFS_FS
> dlm: Make DLM use select CONFIGFS_FS
> ocfs2: Make OCFS2_FS use select CONFIGFS_FS
>
> drivers/net/Kconfig | 3 ++-
> fs/dlm/Kconfig | 4 ++--
> fs/ocfs2/Kconfig | 3 ++-
> 3 files changed, 6 insertions(+), 4 deletions(-)
Ping? This is still broken in Linus's tree as of today ... simple builds like
allmodconfig still fail.
Thanks,
Ingo
^ permalink raw reply
* Re: Realtek r8168C / r8169 driver VLAN TAG stripping
From: Anand Raj Manickam @ 2011-01-27 15:31 UTC (permalink / raw)
To: Francois Romieu; +Cc: netdev, Hayes
In-Reply-To: <20110127150744.GA7925@electric-eye.fr.zoreil.com>
On Thu, Jan 27, 2011 at 8:37 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Anand Raj Manickam <anandrm@gmail.com> :
> [...]
>> We upgraded to 2.6.36 kernel . The result is SAME.
>> The VLAN tag gets stripped ;-)
>> Do let me know if you need more info .
>
> - ip addr show
3: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
inet 172.16.1.1/16 brd 172.16.255.255 scope global eth0
inet6 fe80::217:54ff:fe00:f662/64 scope link
valid_lft forever preferred_lft forever
8: eth0.50@eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc noqueue
link/ether 00:17:54:00:f6:62 brd ff:ff:ff:ff:ff:ff
inet 172.16.10.10/24 brd 172.16.10.255 scope global eth0.50
inet6 fe80::217:54ff:fe00:f662/64 scope link
valid_lft forever preferred_lft forever
> - ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
> I do not get the "VLAN tag gets stripped" concept, especially on Tx.
> Does it mean "no packet" or "a packet whose content is wrong" ?
Sorry for not being clear ;-)
When we transmit a packet with VLAN TAG , the TAG get stripped when
transmitted through the device , the other end trunk port / sniffer
does NOT see a TAG.
Similarly , when a VLAN Tagged packet is sent from the other end , The
TAG gets stripped by the device , We DONOT see the tag .
I use tcpdump -i eth0 -n -nn -e vlan 50
to see if the packets are gettin tagged or NOT .
The same config works on forcedeth
Thanks,
Anand
>
> --
> Ueimor
>
^ permalink raw reply
* skb_split in tcp_retransmit_skb question
From: Sergey Senozhatsky @ 2011-01-27 15:20 UTC (permalink / raw)
To: David S. Miller
Cc: Alexey Kuznetsov, Eric Dumazet, Pekka Savola (ipv6), netdev,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]
Hello,
Suppose we have the following scenario:
tcp_write_timer ->
tcp_retransmit_skb
in tcp_retransmit_skb we have `if (skb->len > cur_mss)' evaluted to true, which leads
to tcp_fragment(sk, skb, cur_mss, cur_mss) call. tcp_fragment calls skb_split(skb, buff, len)
which, in turn, calls skb_split_no_header(skb, skb1, len, pos), where we have
`skb_shinfo(skb)->nr_frags++' while in `for (i = 0; i < nfrags; i++)' loop.
Now we fall back to:
tcp_retransmit_skb ->
tcp_transmit_skb ->
pskb_copy(skb, gfp_mask)
In pskb_copy we perform iteration on nr_frags:
729 if (skb_shinfo(skb)->nr_frags) {
730 int i;
731 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
732 skb_shinfo(n)->frags[i] = skb_shinfo(skb)->frags[i];
733 get_page(skb_shinfo(n)->frags[i].page);
734 }
735 skb_shinfo(n)->nr_frags = i;
736 }
The problem here is that nr_frags was increased in skb_split, yet new page was not allocated.
So, get_page(skb_shinfo(n)->frags[i].page) is actually get_page(NULL):
mov (%rdx), %eax
where %rdx is 0x00
Please correct me if I'm missing something.
Sergey
[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox