* Re: [PATCH RFC] virtio_net: use NAPI for xmit (UNTESTED)
From: Shirley Ma @ 2010-03-31 5:58 UTC (permalink / raw)
To: Rusty Russell; +Cc: Michael S. Tsirkin, netdev, Herbert Xu
In-Reply-To: <1270014284.25337.3.camel@localhost.localdomain>
Back ported it and prepared for more tests.
Shirley
^ permalink raw reply
* Re: [PATCH 0/6] tagged sysfs support
From: Kay Sievers @ 2010-03-31 5:51 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Greg Kroah-Hartman, Greg KH, linux-kernel, Tejun Heo,
Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
Serge Hallyn, netdev
In-Reply-To: <m14ojxh0mz.fsf@fess.ebiederm.org>
On Wed, Mar 31, 2010 at 01:04, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Kay Sievers <kay.sievers@vrfy.org> writes:
>> On Tue, Mar 30, 2010 at 20:30, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>
>>> The main short coming of using multiple network namespaces today
>>> is that only network devices for the primary network namespaces
>>> can be put in the kobject layer and sysfs.
>>>
>>> This is essentially the earlier version of this patchset that was
>>> reviewed before, just now on top of a version of sysfs that doesn't
>>> need cleanup patches to support it.
>>
>> Just to check if we are not in conflict with planned changes, and how
>> to possibly handle them:
>>
>> There is the plan and ongoing work to unify classes and buses, export
>> them at /sys/subsystem in the same layout of the current /sys/bus/.
>> The decision to export buses and classes as two different things
>> (which they aren't) is the last major piece in the sysfs layout which
>> needs to be fixed.
>
> Interesting. We will symlinks ie:
> /sys/class -> /sys/subsystem
> /sys/bus -> /sys/subsystem
> to keep from breaking userspace.
Yeah, /sys/bus/, which is the only sane layout of the needlessly
different 3 versions of the same thing (bus, class, block).
/sys/bus/<subsys> can just be a plain symlinks to the
/sys/subsystem/<subsys> directories.
/sys/class/<subsys> *could* be a symlink to the
/sys/subsystem/<subsys>/devices/ directory, but we really don't want
to continue to stupidly mix subsystem-wide control files with device
lists anymore. The "devices" directory needs to be a strict list of
devices, not some collection of random stuff, that it is today. :)
So we either leave all the conceptually broken class attributes behind
us, and put them at the /sys/subsystem/<subsys>/ level only, or we
need to create the /sys/class/<subsys>/* stuff all as symlinks like we
do today. I expect, we have to create /sys/class as we do today.
Another problem to solve is that sysfs does not allow us to symlink
regular files, only directories, so we can currently not create the
class-wide attributes as symlinks to the proper file in
/sys/subsystem/.
>> It would mean that /sys/subsystem/net/devices/* would look like
>> /sys/class/net/* today. But at the /sys/subsystem/net/ directory could
>> be global network-subsystem-wide control files which would need to be
>> namespaced too. (The network subsystem does not use subsytem-global
>> files today, but a bunch of other classes do.)
>>
>> This could be modeled into the current way of doing sysfs namespaces?
>> A /sys/bus/<subsystem>/ directory hierarchy would need to be
>> namespaced, not just a single plain directory with symlinks. Would
>> that work?
>
> I'm not entirely clear on what you are doing but it all sounds like it
> will fit within what I am doing.
The goal is to unify the 3 needlessly different versions of "device
lists of the same subsystem". We have /sys/class, /sys/bus,
/sys/block, and all of them will be unified at /sys/subsystem/ leaving
the old names as compat links only. Unlike block and class, the
/sys/subsystem/<subsys> directory can be extended with custom
subdirectories and files, without mixing random files into device
lists.
With /sys/subsystem/, userspace can uniquely identify and find all
devices at /sys/<subsys>/devices/<device-name>/ with only the
subsystem and the device name.
All devices in /sys/devices already have a symlink called "subsystem"
which will point back to the corresponding /sys/subsystem/<subsys>
directory, and the event environment already contains a variable
SUBSYSTEM with the name.
That would be the first time sysfs device interfaces have some idea of
consistency. :)
> Right now I have /sys/class/net,
> /sys/devices/virtual/net and a bunch of other net directories becoming
> tagged and only showing up in the appropriately mounted sysfs. We
> track them all in the class kset and as long as we extend that capability
> when the subsystem change happens in sysfs all should be well.
Ok, sounds good.
> Today we have /sys/class/net/bonding_master. For now I have that as
> an untagged but the implementation is aware of which network namespace
> your current process is in. Thinking about that a little more it
> would be better to make that file tagged so that userspace can see
> different versions for the different network namespaces. Joy.
Yeah, that might make more sense in the end.
> I expect other control files will be the same.
Sounds like, yes.
> In general it doesn't make sense to add control files for networking.
> as they easily conflict with legal network device names and thus create
> the possibility of breaking someones userspace.
Yeah, it did not makes sense it the first place to mix devices lists
with global attributes. It's a real mess what people do in sysfs.
Thanks,
Kay
^ permalink raw reply
* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
From: Eric W. Biederman @ 2010-03-31 5:51 UTC (permalink / raw)
To: Tejun Heo
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
netdev
In-Reply-To: <4BB2E098.7030202@kernel.org>
Tejun Heo <tj@kernel.org> writes:
> Hello, Eric.
>
> On 03/31/2010 03:31 AM, Eric W. Biederman wrote:
>> From: Eric W. Biederman <ebiederm@xmission.com>
>>
>> Add all of the necessary bioler plate to support
> boiler :-)
>
>> +static int sysfs_test_super(struct super_block *sb, void *data)
>> +{
>> + struct sysfs_super_info *sb_info = sysfs_info(sb);
>> + struct sysfs_super_info *info = data;
>> + int found = 1;
>> + return found;
>> +}
>
> Can you please make it return bool?
Nope. That would mean I could not use it with sget.
>> static int sysfs_get_sb(struct file_system_type *fs_type,
>> int flags, const char *dev_name, void *data, struct vfsmount *mnt)
>> {
>> - return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
>> + struct sysfs_super_info *info;
>> + struct super_block *sb;
>> + int error;
>> +
>> + error = -ENOMEM;
>> + info = kzalloc(sizeof(*info), GFP_KERNEL);
>> + if (!info)
>> + goto out;
>> + sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
>> + if (IS_ERR(sb) || sb->s_fs_info != info)
>> + kfree(info);
>> + if (IS_ERR(sb)) {
>> + kfree(info);
>> + error = PTR_ERR(sb);
>> + goto out;
>> + }
>> + if (!sb->s_root) {
>> + sb->s_flags = flags;
>> + error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
>> + if (error) {
>> + deactivate_locked_super(sb);
>> + goto out;
>> + }
>> + sb->s_flags |= MS_ACTIVE;
>> + }
>> +
>> + simple_set_mnt(mnt, sb);
>> + error = 0;
>> +out:
>> + return error;
>> +}
>
> I haven't looked at later patches but I suppose this is gonna be
> filled with more meaningful stuff later.
Yes it will.
> One (possibly silly) thing
> that stands out compared to get_sb_single() is missing remount
> handling. Is it intended?
There is nothing for a remount to do so I ignore it. The only
thing that would possibly be meaningful is a read-only mount,
and nothing I know of sysfs suggests read-only mounts of sysfs
work, or make any sense.
>> index 30f5a44..030a39d 100644
>> --- a/fs/sysfs/sysfs.h
>> +++ b/fs/sysfs/sysfs.h
>> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
>> /*
>> * mount.c
>> */
>> +struct sysfs_super_info {
>> +};
>> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
>
> Another nit picking. It would be better to wrap SB in the macro
> definition. Also, wouldn't an inline function be better?
Good spotting. That doesn't bite today but it will certainly bite
someday if it isn't fixed.
I wonder how that has slipped through the review all of this time.
Eric
^ permalink raw reply
* Re: [PATCH 5/6] cxgb4: Add main driver file and driver Makefile
From: David Miller @ 2010-03-31 5:50 UTC (permalink / raw)
To: shemminger; +Cc: dm, netdev
In-Reply-To: <20100330141904.5236fe44@nehalam>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 30 Mar 2010 14:19:04 -0700
> On Tue, 30 Mar 2010 10:52:21 -0800
> Dimitris Michailidis <dm@chelsio.com> wrote:
>
>> +static struct cxgb4_proc_entry proc_files[] = {
>> +#ifdef CONFIG_PROC_FS
>> + { "l2t", 0444, ADAP_NEED_L2T, 0, &t4_l2t_proc_fops },
>> +#endif
>> + { "lb_stats", 0444, 0, 0, &lb_stats_proc_fops },
>> + { "path_mtus", 0644, 0, 0, &mtutab_proc_fops },
>> + { "qstats", 0444, 0, 0, &sge_stats_proc_fops },
>> + { "rss", 0444, 0, 0, &rss_proc_fops },
>> + { "tcp_stats", 0444, 0, 0, &tcp_stats_proc_fops },
>> + { "tids", 0444, ADAP_NEED_OFLD, 0, &tid_info_proc_fops },
>> + { "tp_err_stats", 0444, 0, 0, &tp_err_stats_proc_fops },
>> + { "trace0", 0644, 0, 0, &mps_trc_proc_fops },
>> + { "trace1", 0644, 0, 1, &mps_trc_proc_fops },
>> + { "trace2", 0644, 0, 2, &mps_trc_proc_fops },
>> + { "trace3", 0644, 0, 3, &mps_trc_proc_fops },
>> + { "uld", 0444, 0, 0, &uld_proc_fops },
>> +};
>> +
>
> Do you really need this large number of /proc files.
> It creates another stable API to worry about. If it is just for
> debugging move it to debugfs, or better yet just drop it.
I also find this a bit too much.
We have all kinds of ways to export whatever statistics you
want. In particular we have ethtool stats of which you
can have as many as you wish.
If necessary, we could add a feature to define "views" of ethtool
stats so that we can have domains of driver specific statistics if the
problem is that you don't want all of these debugging stats to clutter
up the "main" ethtool stats.
^ permalink raw reply
* Re: [PATCH RFC] virtio_net: use NAPI for xmit (UNTESTED)
From: Shirley Ma @ 2010-03-31 5:44 UTC (permalink / raw)
To: Rusty Russell; +Cc: Michael S. Tsirkin, netdev, Herbert Xu
In-Reply-To: <201003311429.57793.rusty@rustcorp.com.au>
Hello Rusty,
On Wed, 2010-03-31 at 14:29 +1030, Rusty Russell wrote:
> I don't have time to chase this, but it's been sitting in my patch
> queue
> for a while. Wondered if Michael or Shirley wanted to toy with it
>
Does this patch build on top of net-next-2.6?
Shirley
^ permalink raw reply
* Re: [PATCH RFC] inetpeer: Support ipv6 addresses.
From: Eric W. Biederman @ 2010-03-31 5:44 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev
In-Reply-To: <20100328135931.GA16430@gondor.apana.org.au>
Herbert Xu <herbert@gondor.apana.org.au> writes:
> BTW, it appears that the inetpeer cache doesn't take namespaces
> into account. This means that information could potentially leak
> from one namespace into another. I'm not sure whether that's a
> big deal or not but it's something for the namespaces folks to
> consider.
Bother. I wrote a patch a while back, I even remember people
commenting on it, but it certainly doesn't look like anything
ever made it in.
I will see if I can make some time to dig that up and post
the patch.
Thanks for noticing,
Eric
^ permalink raw reply
* Re: [PATCH RFC] inetpeer: Support ipv6 addresses.
From: Eric W. Biederman @ 2010-03-31 5:43 UTC (permalink / raw)
To: Herbert Xu; +Cc: David Miller, netdev
In-Reply-To: <20100328135931.GA16430@gondor.apana.org.au>
Herbert Xu <herbert@gondor.apana.org.au> writes:
> BTW, it appears that the inetpeer cache doesn't take namespaces
> into account. This means that information could potentially leak
> from one namespace into another. I'm not sure whether that's a
> big deal or not but it's something for the namespaces folks to
> consider.
Bother. I wrote a patch a while back, I even remember people
commenting on it, but it certainly doesn't look like anything
ever made it in.
I will see if I can make some time to dig that up and post
the patch.
Thanks for noticing,
Eric
^ permalink raw reply
* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
From: Tejun Heo @ 2010-03-31 5:41 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
netdev
In-Reply-To: <1269973889-25260-1-git-send-email-ebiederm@xmission.com>
Hello, Eric.
On 03/31/2010 03:31 AM, Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
>
> Add all of the necessary bioler plate to support
boiler :-)
> +static int sysfs_test_super(struct super_block *sb, void *data)
> +{
> + struct sysfs_super_info *sb_info = sysfs_info(sb);
> + struct sysfs_super_info *info = data;
> + int found = 1;
> + return found;
> +}
Can you please make it return bool?
> static int sysfs_get_sb(struct file_system_type *fs_type,
> int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> {
> - return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
> + struct sysfs_super_info *info;
> + struct super_block *sb;
> + int error;
> +
> + error = -ENOMEM;
> + info = kzalloc(sizeof(*info), GFP_KERNEL);
> + if (!info)
> + goto out;
> + sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
> + if (IS_ERR(sb) || sb->s_fs_info != info)
> + kfree(info);
> + if (IS_ERR(sb)) {
> + kfree(info);
> + error = PTR_ERR(sb);
> + goto out;
> + }
> + if (!sb->s_root) {
> + sb->s_flags = flags;
> + error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
> + if (error) {
> + deactivate_locked_super(sb);
> + goto out;
> + }
> + sb->s_flags |= MS_ACTIVE;
> + }
> +
> + simple_set_mnt(mnt, sb);
> + error = 0;
> +out:
> + return error;
> +}
I haven't looked at later patches but I suppose this is gonna be
filled with more meaningful stuff later. One (possibly silly) thing
that stands out compared to get_sb_single() is missing remount
handling. Is it intended?
> index 30f5a44..030a39d 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
> /*
> * mount.c
> */
> +struct sysfs_super_info {
> +};
> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
Another nit picking. It would be better to wrap SB in the macro
definition. Also, wouldn't an inline function be better?
Thanks.
--
tejun
^ permalink raw reply
* Re: rps: keep the old behavior on SMP without rps
From: David Miller @ 2010-03-31 5:40 UTC (permalink / raw)
To: xiaosuo; +Cc: therbert, netdev
In-Reply-To: <4BB2DD95.90402@gmail.com>
From: Changli Gao <xiaosuo@gmail.com>
Date: Wed, 31 Mar 2010 13:28:53 +0800
> keep the old behavior on SMP without rps
>
> RPS introduces a lock operation to per cpu variable input_pkt_queue on
> SMP whenever rps is enabled or not. On SMP without RPS, this lock isn't
> needed at all.
>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Instead of peppering the file with lots of ifdefs, encapsulate
the thing thats changing into a set of inline functions.
^ permalink raw reply
* Re: [PATCH] virtio_net: avoid BUG_ON() with large packets when CONFIG_DEBUG_SG=y
From: Shirley Ma @ 2010-03-31 5:39 UTC (permalink / raw)
To: Rusty Russell; +Cc: David Miller, netdev, mst
In-Reply-To: <201003311105.19014.rusty@rustcorp.com.au>
On Wed, 2010-03-31 at 11:05 +1030, Rusty Russell wrote:
> Shirley, please cc me in future.
I will. I thought you might be on vacation. :)
Shirley
^ permalink raw reply
* rps: keep the old behavior on SMP without rps
From: Changli Gao @ 2010-03-31 5:28 UTC (permalink / raw)
To: David S. Miller; +Cc: Tom Herbert, xiaosuo, netdev
keep the old behavior on SMP without rps
RPS introduces a lock operation to per cpu variable input_pkt_queue on
SMP whenever rps is enabled or not. On SMP without RPS, this lock isn't
needed at all.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/core/dev.c | 34 ++++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 3e7fa16..14ad3b7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2314,13 +2314,19 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu)
local_irq_save(flags);
__get_cpu_var(netdev_rx_stat).total++;
+#ifdef CONFIG_RPS
spin_lock(&queue->input_pkt_queue.lock);
+#endif
if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
if (queue->input_pkt_queue.qlen) {
enqueue:
__skb_queue_tail(&queue->input_pkt_queue, skb);
+#ifdef CONFIG_RPS
spin_unlock_irqrestore(&queue->input_pkt_queue.lock,
flags);
+#else
+ local_irq_restore(flags);
+#endif
return NET_RX_SUCCESS;
}
@@ -2342,7 +2348,9 @@ enqueue:
goto enqueue;
}
+#ifdef CONFIG_RPS
spin_unlock(&queue->input_pkt_queue.lock);
+#endif
__get_cpu_var(netdev_rx_stat).dropped++;
local_irq_restore(flags);
@@ -2767,19 +2775,23 @@ int netif_receive_skb(struct sk_buff *skb)
EXPORT_SYMBOL(netif_receive_skb);
/* Network device is going away, flush any packets still pending */
-static void flush_backlog(struct net_device *dev, int cpu)
+static void flush_backlog(void *arg)
{
- struct softnet_data *queue = &per_cpu(softnet_data, cpu);
+ struct net_device *dev = arg;
+ struct softnet_data *queue = &__get_cpu_var(softnet_data);
struct sk_buff *skb, *tmp;
- unsigned long flags;
- spin_lock_irqsave(&queue->input_pkt_queue.lock, flags);
+#ifdef CONFIG_RPS
+ spin_lock(&queue->input_pkt_queue.lock);
+#endif
skb_queue_walk_safe(&queue->input_pkt_queue, skb, tmp)
if (skb->dev == dev) {
__skb_unlink(skb, &queue->input_pkt_queue);
kfree_skb(skb);
}
- spin_unlock_irqrestore(&queue->input_pkt_queue.lock, flags);
+#ifdef CONFIG_RPS
+ spin_unlock(&queue->input_pkt_queue.lock);
+#endif
}
static int napi_gro_complete(struct sk_buff *skb)
@@ -3092,14 +3104,22 @@ static int process_backlog(struct napi_struct *napi, int quota)
do {
struct sk_buff *skb;
+#ifdef CONFIG_RPS
spin_lock_irq(&queue->input_pkt_queue.lock);
+#else
+ local_irq_disable();
+#endif
skb = __skb_dequeue(&queue->input_pkt_queue);
if (!skb) {
__napi_complete(napi);
spin_unlock_irq(&queue->input_pkt_queue.lock);
break;
}
+#ifdef CONFIG_RPS
spin_unlock_irq(&queue->input_pkt_queue.lock);
+#else
+ local_irq_enable();
+#endif
__netif_receive_skb(skb);
} while (++work < quota && jiffies == start_time);
@@ -5549,7 +5569,6 @@ void netdev_run_todo(void)
while (!list_empty(&list)) {
struct net_device *dev
= list_first_entry(&list, struct net_device, todo_list);
- int i;
list_del(&dev->todo_list);
if (unlikely(dev->reg_state != NETREG_UNREGISTERING)) {
@@ -5561,8 +5580,7 @@ void netdev_run_todo(void)
dev->reg_state = NETREG_UNREGISTERED;
- for_each_online_cpu(i)
- flush_backlog(dev, i);
+ on_each_cpu(flush_backlog, dev, 1);
netdev_wait_allrefs(dev);
^ permalink raw reply related
* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
From: Serge E. Hallyn @ 2010-03-31 5:01 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
netdev
In-Reply-To: <20100331050100.GB10144@us.ibm.com>
Quoting Serge E. Hallyn (serue@us.ibm.com):
> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > From: Eric W. Biederman <ebiederm@xmission.com>
> >
> > Add all of the necessary bioler plate to support
> > multiple superblocks in sysfs.
> >
> > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>
> Acked-by: Serge Hallyn <serue@us.ibm.com>
(with the patch 7/6 of course :)
^ permalink raw reply
* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
From: Serge E. Hallyn @ 2010-03-31 5:01 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
netdev
In-Reply-To: <1269973889-25260-1-git-send-email-ebiederm@xmission.com>
Quoting Eric W. Biederman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
>
> Add all of the necessary bioler plate to support
> multiple superblocks in sysfs.
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
> ---
> fs/sysfs/mount.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> fs/sysfs/sysfs.h | 3 ++
> 2 files changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 0cb1088..6a433ac 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -71,16 +71,70 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
> return 0;
> }
>
> +static int sysfs_test_super(struct super_block *sb, void *data)
> +{
> + struct sysfs_super_info *sb_info = sysfs_info(sb);
> + struct sysfs_super_info *info = data;
> + int found = 1;
> + return found;
> +}
> +
> +static int sysfs_set_super(struct super_block *sb, void *data)
> +{
> + int error;
> + error = set_anon_super(sb, data);
> + if (!error)
> + sb->s_fs_info = data;
> + return error;
> +}
> +
> static int sysfs_get_sb(struct file_system_type *fs_type,
> int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> {
> - return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
> + struct sysfs_super_info *info;
> + struct super_block *sb;
> + int error;
> +
> + error = -ENOMEM;
> + info = kzalloc(sizeof(*info), GFP_KERNEL);
> + if (!info)
> + goto out;
> + sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
> + if (IS_ERR(sb) || sb->s_fs_info != info)
> + kfree(info);
> + if (IS_ERR(sb)) {
> + kfree(info);
> + error = PTR_ERR(sb);
> + goto out;
> + }
> + if (!sb->s_root) {
> + sb->s_flags = flags;
> + error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
> + if (error) {
> + deactivate_locked_super(sb);
> + goto out;
> + }
> + sb->s_flags |= MS_ACTIVE;
> + }
> +
> + simple_set_mnt(mnt, sb);
> + error = 0;
> +out:
> + return error;
> +}
> +
> +static void sysfs_kill_sb(struct super_block *sb)
> +{
> + struct sysfs_super_info *info = sysfs_info(sb);
> +
> + kill_anon_super(sb);
> + kfree(info);
> }
>
> static struct file_system_type sysfs_fs_type = {
> .name = "sysfs",
> .get_sb = sysfs_get_sb,
> - .kill_sb = kill_anon_super,
> + .kill_sb = sysfs_kill_sb,
> };
>
> int __init sysfs_init(void)
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 30f5a44..030a39d 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
> /*
> * mount.c
> */
> +struct sysfs_super_info {
> +};
> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
> extern struct sysfs_dirent sysfs_root;
> extern struct kmem_cache *sysfs_dir_cachep;
>
> --
> 1.6.5.2.143.g8cc62
^ permalink raw reply
* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Serge E. Hallyn @ 2010-03-31 4:53 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
netdev, Benjamin Thery
In-Reply-To: <m1eij1rued.fsf@fess.ebiederm.org>
Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
>
> >> > This is a huge patch, and for the most part I haven't found any problems,
> >> > except potentially this one. It looks like sysfs_rename_link() checks
> >> > old_ns and new_ns before calling sysfs_rename(). But sysfs_mutex isn't
> >> > taken until sysfs_rename(). sysfs_rename() will then proceed to do
> >> > the rename, and unconditionally set sd->ns = new_ns.
> >> >
> >> > In the meantime, it seems as though new_ns might have exited, and
> >> > sysfs_exit_ns() unset new_ns on the new parent dir. This means that
> >> > we'll end up with the namespace code having thought that it cleared
> >> > all new_ns's, but this file will have snuck by. Meaning an action on
> >> > the renamed file might dereference a freed namespace.
> >> >
> >> > Or am I way off base?
> >>
> >> There are a couple of reasons why this is not a concern.
> >>
> >> The only new_ns we clear is on the super block.
> >
> > Oops, yeah - I failed to note that.
> >
> >> sysfs itself never dereferences namespace arguments and only uses them
> >> for comparison purposes. They are just cookies that cause comparisons
> >> to differ from a sysfs perspective.
> >>
> >> The upper levels are responsible for taking care of them selves
> >> sysfs_mutex does not protect them. If you compile out sysfs the sysfs
> >> mutex is not even present.
> >>
> >> In the worst case if the upper levels mess up we will have a stale
> >> token that we never dereference on a sysfs dirent, which in a pathological
> >> case will happen to be the same as a new namespace and we will have
> >> a spurious directory entry that we have leaked.
> >>
> >> In practice we move all network devices (and thus sysfs files) out of
> >> a network namespace before allowing it to exit.
> >
> > Ok, that makes sense too - so any tagged sysfs file created for some object
> > in a ns must be deleted at netns exit. I could imagine someone expecting
> > that if the ns exits, the tasks in the ns will exit, causing the sysfs
> > mount to be umounted and auto-deleting the files? (which of course would
> > get buggered if task in other ns was examining the mount which it got
> > through mounts propagation) We'll have to make sure noone does that. Should
> > it be documented somewhere, or is that obvious enough?
>
> In general it is simply true. An object in a namespace either keeps
> the namespace alive, or it is destroyed when the namespace exits
> because the object is unreachable.
I guess you'd hope so :)
> So the only possible problem I can think of is of ordering the object
> destruction and calling sysfs_exit_ns. So for the moment I am going
> to vote that this is simply obvious enough not to worry about in detail.
>
> It is also pretty obvious if you trace the code and ask how does sysfs
> dirent X get destroyed.
>
> Today there is just a wee bit of automatic file destruction at the sysfs
> level. The device layer does not take advantage of it, and in hierarchical
> situation it leads to bugs. So even I think if we document anything it
> should be that sysfs can not safely automatically delete anything, for
> you.
>
> Eric
Ok. I'm convinced.
thanks,
-serge
^ permalink raw reply
* Re: [RFC v3] net: add PCINet driver
From: Kumar Gala @ 2010-03-31 4:46 UTC (permalink / raw)
To: Ira Snyder
Cc: linux-kernel, linuxppc-dev, netdev, Stephen Hemminger,
Arnd Bergmann, Jan-Bernd Themann
In-Reply-To: <20081105212225.GA17821@ovro.caltech.edu>
On Nov 5, 2008, at 3:22 PM, Ira Snyder wrote:
> This adds support to Linux for a virtual ethernet interface which uses the
> PCI bus as its transport mechanism. It creates a simple, familiar, and fast
> method of communication for two devices connected by a PCI interface.
>
> I have implemented client support for the Freescale MPC8349EMDS board,
> which is capable of running in PCI Agent mode (It acts like a PCI card, but
> is a complete PowerPC computer, running Linux). It is almost certainly
> trivially ported to any MPC83xx system.
>
> It was developed to work in a CompactPCI crate of computers, one of which
> is a relatively standard x86 system (acting as the host) and many PowerPC
> systems (acting as clients).
>
> RFC v2 -> RFC v3:
> * use inline functions for accessing struct circ_buf_desc
> * use pointer dereferencing on PowerPC local memory instead of ioread32()
> * move IMMR and buffer descriptor accessors inside drivers
> * update for dma_mapping_error() API changes
> * use minimal locking primitives (i.e. spin_lock() instead of _irqsave())
> * always disable checksumming, PCI is reliable
> * replace typedef cbd_t with struct circ_buf_desc
> * use get_immrbase() to get IMMR register offsets
>
> RFC v1 -> RFC v2:
> * remove vim modelines
> * use net_device->name in request_irq(), for irqbalance
> * remove unneccesary wqt_get_stats(), use default get_stats() instead
> * use dev_printk() and friends
> * add message unit to MPC8349EMDS dts file
>
> Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
> ---
> This is the third RFC posting of this driver. I got some feedback, and have
> corrected the problems. Thanks to everyone who has done review! I have
> gotten off-list feedback from several potential users, so there are
> definitely many potential users.
>
> I'll post up a revised version about once a week as long as the changes are
> minor. If they are more substantial, I'll post them as needed.
>
> The remaining issues I see in this driver:
> 1) ==== Naming ====
> The name wqt originally stood for "workqueue-test" and somewhat evolved
> over time into the current driver. I'm looking for suggestions for a
> better name. It should be the same between the host and client drivers,
> to make porting the code between them easier. The drivers are /very/
> similar other than the setup code.
> 2) ==== IMMR Usage ====
> In the Freescale client driver, I use the whole set of board control
> registers (AKA IMMR registers). I only need a very small subset of them,
> during startup to set up the DMA window. I used the full set of
> registers so that I could share the register offsets between the drivers
> (in pcinet_hw.h)
> 3) ==== Hardcoded DMA Window Address ====
> In the Freescale client driver, I just hardcoded the address of the
> outbound PCI window into the DMA transfer code. It is 0x80000000.
> Suggestions on how to get this value at runtime are welcome.
>
>
> Rationale behind some decisions:
> 1) ==== Usage of the PCINET_NET_REGISTERS_VALID bit ====
> I want to be able to use this driver from U-Boot to tftp a kernel over
> the PCI backplane, and then boot up the board. This means that the
> device descriptor memory, which lives in the client RAM, becomes invalid
> during boot.
> 2) ==== Buffer Descriptors in client memory ====
> I chose to put the buffer descriptors in client memory rather than host
> memory. It seemed more logical to me at the time. In my application,
> I'll have 19 boards + 1 host per cPCI chassis. The client -> host
> direction will see most of the traffic, and so I thought I would cut
> down on the number of PCI accesses needed. I'm willing to change this.
> 3) ==== Usage of client DMA controller for all data transfer ====
> This was done purely for speed. I tried using the CPU to transfer all
> data, and it is very slow: ~3MB/sec. Using the DMA controller gets me to
> ~40MB/sec (as tested with netperf).
> 4) ==== Static 1GB DMA window ====
> Maintaining a window while DMA's in flight, and then changing it seemed
> too complicated. Also, testing showed that using a static window gave me
> a ~10MB/sec speedup compared to moving the window for each skb.
> 5) ==== The serial driver ====
> Yes, there are two essentially separate drivers here. I needed a method
> to communicate with the U-Boot bootloader on these boards without
> plugging in a serial cable. With 19 clients + 1 host per chassis, the
> cable clutter is worth avoiding. Since everything is connected via the
> PCI bus anyway, I used that. A virtual serial port was simple to
> implement using the messaging unit hardware that I used for the network
> driver.
>
> I'll post both U-Boot drivers to their mailing list once this driver is
> finalized.
>
> Thanks,
> Ira
>
> arch/powerpc/boot/dts/mpc834x_mds.dts | 7 +
> drivers/net/Kconfig | 29 +
> drivers/net/Makefile | 3 +
> drivers/net/pcinet.h | 60 ++
> drivers/net/pcinet_fsl.c | 1358 ++++++++++++++++++++++++++++++++
> drivers/net/pcinet_host.c | 1388 +++++++++++++++++++++++++++++++++
> drivers/net/pcinet_hw.h | 77 ++
> 7 files changed, 2922 insertions(+), 0 deletions(-)
> create mode 100644 drivers/net/pcinet.h
> create mode 100644 drivers/net/pcinet_fsl.c
> create mode 100644 drivers/net/pcinet_host.c
> create mode 100644 drivers/net/pcinet_hw.h
What ever happened to this?
- k
^ permalink raw reply
* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Eric W. Biederman @ 2010-03-31 4:23 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
netdev, Benjamin Thery
In-Reply-To: <20100331040234.GA7184@us.ibm.com>
"Serge E. Hallyn" <serue@us.ibm.com> writes:
>> > This is a huge patch, and for the most part I haven't found any problems,
>> > except potentially this one. It looks like sysfs_rename_link() checks
>> > old_ns and new_ns before calling sysfs_rename(). But sysfs_mutex isn't
>> > taken until sysfs_rename(). sysfs_rename() will then proceed to do
>> > the rename, and unconditionally set sd->ns = new_ns.
>> >
>> > In the meantime, it seems as though new_ns might have exited, and
>> > sysfs_exit_ns() unset new_ns on the new parent dir. This means that
>> > we'll end up with the namespace code having thought that it cleared
>> > all new_ns's, but this file will have snuck by. Meaning an action on
>> > the renamed file might dereference a freed namespace.
>> >
>> > Or am I way off base?
>>
>> There are a couple of reasons why this is not a concern.
>>
>> The only new_ns we clear is on the super block.
>
> Oops, yeah - I failed to note that.
>
>> sysfs itself never dereferences namespace arguments and only uses them
>> for comparison purposes. They are just cookies that cause comparisons
>> to differ from a sysfs perspective.
>>
>> The upper levels are responsible for taking care of them selves
>> sysfs_mutex does not protect them. If you compile out sysfs the sysfs
>> mutex is not even present.
>>
>> In the worst case if the upper levels mess up we will have a stale
>> token that we never dereference on a sysfs dirent, which in a pathological
>> case will happen to be the same as a new namespace and we will have
>> a spurious directory entry that we have leaked.
>>
>> In practice we move all network devices (and thus sysfs files) out of
>> a network namespace before allowing it to exit.
>
> Ok, that makes sense too - so any tagged sysfs file created for some object
> in a ns must be deleted at netns exit. I could imagine someone expecting
> that if the ns exits, the tasks in the ns will exit, causing the sysfs
> mount to be umounted and auto-deleting the files? (which of course would
> get buggered if task in other ns was examining the mount which it got
> through mounts propagation) We'll have to make sure noone does that. Should
> it be documented somewhere, or is that obvious enough?
In general it is simply true. An object in a namespace either keeps
the namespace alive, or it is destroyed when the namespace exits
because the object is unreachable.
So the only possible problem I can think of is of ordering the object
destruction and calling sysfs_exit_ns. So for the moment I am going
to vote that this is simply obvious enough not to worry about in detail.
It is also pretty obvious if you trace the code and ask how does sysfs
dirent X get destroyed.
Today there is just a wee bit of automatic file destruction at the sysfs
level. The device layer does not take advantage of it, and in hierarchical
situation it leads to bugs. So even I think if we document anything it
should be that sysfs can not safely automatically delete anything, for
you.
Eric
^ permalink raw reply
* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Serge E. Hallyn @ 2010-03-31 4:02 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
netdev, Benjamin Thery
In-Reply-To: <m1zl1prwie.fsf@fess.ebiederm.org>
Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
>
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> int sysfs_rename(struct sysfs_dirent *sd,
> >> - struct sysfs_dirent *new_parent_sd, const char *new_name)
> >> + struct sysfs_dirent *new_parent_sd, const void *new_ns,
> >> + const char *new_name)
> >> {
> >> const char *dup_name = NULL;
> >> int error;
> >> @@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd,
> >> mutex_lock(&sysfs_mutex);
> >>
> >> error = 0;
> >> - if ((sd->s_parent == new_parent_sd) &&
> >> + if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
> >> (strcmp(sd->s_name, new_name) == 0))
> >> goto out; /* nothing to rename */
> >>
> >> error = -EEXIST;
> >> - if (sysfs_find_dirent(new_parent_sd, new_name))
> >> + if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
> >> goto out;
> >>
> >> /* rename sysfs_dirent */
> >> @@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd,
> >> sd->s_parent = new_parent_sd;
> >> sysfs_link_sibling(sd);
> >> }
> >> + sd->s_ns = new_ns;
> >>
> >> error = 0;
> >> out:
> >
> > ...
> >
> >> +void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
> >> +{
> >> + struct super_block *sb;
> >> +
> >> + mutex_lock(&sysfs_mutex);
> >> + spin_lock(&sb_lock);
> >> + list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
> >> + struct sysfs_super_info *info = sysfs_info(sb);
> >> + /* Ignore superblocks that are in the process of unmounting */
> >> + if (sb->s_count <= S_BIAS)
> >> + continue;
> >> + /* Ignore superblocks with the wrong ns */
> >> + if (info->ns[type] != ns)
> >> + continue;
> >> + info->ns[type] = NULL;
> >> + }
> >> + spin_unlock(&sb_lock);
> >> + mutex_unlock(&sysfs_mutex);
> >> +}
> >> +
> >
> > ..
> >
> >> @@ -136,6 +138,7 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >> const char *old, const char *new)
> >> {
> >> struct sysfs_dirent *parent_sd, *sd = NULL;
> >> + const void *old_ns = NULL, *new_ns = NULL;
> >> int result;
> >>
> >> if (!kobj)
> >> @@ -143,8 +146,11 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >> else
> >> parent_sd = kobj->sd;
> >>
> >> + if (targ->sd)
> >> + old_ns = targ->sd->s_ns;
> >> +
> >> result = -ENOENT;
> >> - sd = sysfs_get_dirent(parent_sd, old);
> >> + sd = sysfs_get_dirent(parent_sd, old_ns, old);
> >> if (!sd)
> >> goto out;
> >>
> >> @@ -154,7 +160,10 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >> if (sd->s_symlink.target_sd->s_dir.kobj != targ)
> >> goto out;
> >>
> >> - result = sysfs_rename(sd, parent_sd, new);
> >> + if (sysfs_ns_type(parent_sd))
> >> + new_ns = targ->ktype->namespace(targ);
> >> +
> >> + result = sysfs_rename(sd, parent_sd, new_ns, new);
> >>
> >> out:
> >> sysfs_put(sd);
> >
> > This is a huge patch, and for the most part I haven't found any problems,
> > except potentially this one. It looks like sysfs_rename_link() checks
> > old_ns and new_ns before calling sysfs_rename(). But sysfs_mutex isn't
> > taken until sysfs_rename(). sysfs_rename() will then proceed to do
> > the rename, and unconditionally set sd->ns = new_ns.
> >
> > In the meantime, it seems as though new_ns might have exited, and
> > sysfs_exit_ns() unset new_ns on the new parent dir. This means that
> > we'll end up with the namespace code having thought that it cleared
> > all new_ns's, but this file will have snuck by. Meaning an action on
> > the renamed file might dereference a freed namespace.
> >
> > Or am I way off base?
>
> There are a couple of reasons why this is not a concern.
>
> The only new_ns we clear is on the super block.
Oops, yeah - I failed to note that.
> sysfs itself never dereferences namespace arguments and only uses them
> for comparison purposes. They are just cookies that cause comparisons
> to differ from a sysfs perspective.
>
> The upper levels are responsible for taking care of them selves
> sysfs_mutex does not protect them. If you compile out sysfs the sysfs
> mutex is not even present.
>
> In the worst case if the upper levels mess up we will have a stale
> token that we never dereference on a sysfs dirent, which in a pathological
> case will happen to be the same as a new namespace and we will have
> a spurious directory entry that we have leaked.
>
> In practice we move all network devices (and thus sysfs files) out of
> a network namespace before allowing it to exit.
Ok, that makes sense too - so any tagged sysfs file created for some object
in a ns must be deleted at netns exit. I could imagine someone expecting
that if the ns exits, the tasks in the ns will exit, causing the sysfs
mount to be umounted and auto-deleting the files? (which of course would
get buggered if task in other ns was examining the mount which it got
through mounts propagation) We'll have to make sure noone does that. Should
it be documented somewhere, or is that obvious enough?
(I'm thinking of other namespaces in the future, not net_ns which I
understand doesn't do that)
> The network namespace
> is not listed so it is invisible to anyone wanting to poke a network
> device into an exiting network namespace. The unlisting of the
> network namespace and the device_rename both happen under the
> rtnl_lock which guarantees they are serialized.
>
> Eric
^ permalink raw reply
* [PATCH RFC] virtio_net: use NAPI for xmit (UNTESTED)
From: Rusty Russell @ 2010-03-31 3:59 UTC (permalink / raw)
To: Michael S. Tsirkin, Shirley Ma; +Cc: netdev, Herbert Xu
I don't have time to chase this, but it's been sitting in my patch queue
for a while. Wondered if Michael or Shirley wanted to toy with it
Thanks!
Rusty.
This is closer to the way tg3 and ixgbe do it: use the NAPI framework to
free transmitted packets. It neatens things a little as well.
Changes since last version:
1) Use the tx lock for the xmit_poll to synchronize against
start_xmit; it might be overkill, but it's simple.
2) Don't wake queue if the carrier is gone.
(Note: a side effect of this is that we are lazier in freeing old xmit skbs.
This might be a slight win).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
drivers/net/virtio_net.c | 71 ++++++++++++++++++++++++++++++++---------------
1 file changed, 49 insertions(+), 22 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -47,6 +47,9 @@ struct virtnet_info
struct napi_struct napi;
unsigned int status;
+ /* We free packets and decide whether to restart xmit here. */
+ struct napi_struct xmit_napi;
+
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
@@ -60,6 +63,9 @@ struct virtnet_info
struct sk_buff_head recv;
struct sk_buff_head send;
+ /* Capacity left in xmit queue. */
+ unsigned int capacity;
+
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
@@ -111,11 +117,8 @@ static void skb_xmit_done(struct virtque
{
struct virtnet_info *vi = svq->vdev->priv;
- /* Suppress further interrupts. */
- svq->vq_ops->disable_cb(svq);
-
/* We were probably waiting for more output buffers. */
- netif_wake_queue(vi->dev);
+ napi_schedule(&vi->xmit_napi);
}
static void receive_skb(struct net_device *dev, struct sk_buff *skb,
@@ -455,6 +458,29 @@ static unsigned int free_old_xmit_skbs(s
return tot_sgs;
}
+static int virtnet_xmit_poll(struct napi_struct *xmit_napi, int budget)
+{
+ struct virtnet_info *vi =
+ container_of(xmit_napi, struct virtnet_info, xmit_napi);
+
+ /* Don't access vq/capacity at same time as start_xmit. */
+ __netif_tx_lock(netdev_get_tx_queue(vi->dev, 0), smp_processor_id());
+
+ vi->capacity += free_old_xmit_skbs(vi);
+ if (vi->capacity >= 2 + MAX_SKB_FRAGS) {
+ /* Suppress further xmit interrupts. */
+ vi->svq->vq_ops->disable_cb(vi->svq);
+ napi_complete(xmit_napi);
+
+ /* Don't wake it if link is down. */
+ if (likely(netif_carrier_ok(vi->vdev)))
+ netif_wake_queue(vi->dev);
+ }
+
+ __netif_tx_unlock(netdev_get_tx_queue(vi->dev, 0));
+ return 1;
+}
+
static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
{
struct scatterlist sg[2+MAX_SKB_FRAGS];
@@ -509,10 +535,6 @@ static netdev_tx_t start_xmit(struct sk_
struct virtnet_info *vi = netdev_priv(dev);
int capacity;
-again:
- /* Free up any pending old buffers before queueing new ones. */
- free_old_xmit_skbs(vi);
-
/* Try to transmit */
capacity = xmit_skb(vi, skb);
@@ -520,14 +542,13 @@ again:
if (unlikely(capacity < 0)) {
netif_stop_queue(dev);
dev_warn(&dev->dev, "Unexpected full queue\n");
- if (unlikely(!vi->svq->vq_ops->enable_cb(vi->svq))) {
- vi->svq->vq_ops->disable_cb(vi->svq);
- netif_start_queue(dev);
- goto again;
- }
+ /* If we missed an interrupt, we let virtnet_xmit_poll deal. */
+ if (unlikely(!vi->svq->vq_ops->enable_cb(vi->svq)))
+ napi_schedule(&vi->xmit_napi);
return NETDEV_TX_BUSY;
}
vi->svq->vq_ops->kick(vi->svq);
+ vi->capacity = capacity;
/*
* Put new one in send queue. You'd expect we'd need this before
@@ -545,14 +566,13 @@ again:
/* Apparently nice girls don't return TX_BUSY; stop the queue
* before it gets out of hand. Naturally, this wastes entries. */
if (capacity < 2+MAX_SKB_FRAGS) {
- netif_stop_queue(dev);
- if (unlikely(!vi->svq->vq_ops->enable_cb(vi->svq))) {
- /* More just got used, free them then recheck. */
- capacity += free_old_xmit_skbs(vi);
- if (capacity >= 2+MAX_SKB_FRAGS) {
- netif_start_queue(dev);
- vi->svq->vq_ops->disable_cb(vi->svq);
- }
+ /* Free old skbs; might make more capacity. */
+ vi->capacity = capacity + free_old_xmit_skbs(vi);
+ if (unlikely(vi->capacity < 2+MAX_SKB_FRAGS)) {
+ netif_stop_queue(dev);
+ /* Missed xmit irq? virtnet_xmit_poll will deal. */
+ if (unlikely(!vi->svq->vq_ops->enable_cb(vi->svq)))
+ napi_schedule(&vi->xmit_napi);
}
}
@@ -590,6 +610,7 @@ static int virtnet_open(struct net_devic
struct virtnet_info *vi = netdev_priv(dev);
napi_enable(&vi->napi);
+ napi_enable(&vi->xmit_napi);
/* If all buffers were filled by other side before we napi_enabled, we
* won't get another interrupt, so process any outstanding packets
@@ -652,6 +673,7 @@ static int virtnet_close(struct net_devi
struct virtnet_info *vi = netdev_priv(dev);
napi_disable(&vi->napi);
+ napi_disable(&vi->xmit_napi);
return 0;
}
@@ -818,9 +840,13 @@ static void virtnet_update_status(struct
if (vi->status & VIRTIO_NET_S_LINK_UP) {
netif_carrier_on(vi->dev);
- netif_wake_queue(vi->dev);
+ /* Make sure virtnet_xmit_poll sees carrier enabled. */
+ wmb();
+ napi_schedule(&vi->xmit_napi);
} else {
netif_carrier_off(vi->dev);
+ /* Make sure virtnet_xmit_poll sees carrier disabled. */
+ wmb();
netif_stop_queue(vi->dev);
}
}
@@ -883,6 +909,7 @@ static int virtnet_probe(struct virtio_d
/* Set up our device-specific information */
vi = netdev_priv(dev);
netif_napi_add(dev, &vi->napi, virtnet_poll, napi_weight);
+ netif_napi_add(dev, &vi->xmit_napi, virtnet_xmit_poll, 64);
vi->dev = dev;
vi->vdev = vdev;
vdev->priv = vi;
^ permalink raw reply
* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Eric W. Biederman @ 2010-03-31 3:38 UTC (permalink / raw)
To: Serge E. Hallyn
Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
netdev, Benjamin Thery
In-Reply-To: <20100331024346.GB27001@us.ibm.com>
"Serge E. Hallyn" <serue@us.ibm.com> writes:
> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> int sysfs_rename(struct sysfs_dirent *sd,
>> - struct sysfs_dirent *new_parent_sd, const char *new_name)
>> + struct sysfs_dirent *new_parent_sd, const void *new_ns,
>> + const char *new_name)
>> {
>> const char *dup_name = NULL;
>> int error;
>> @@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd,
>> mutex_lock(&sysfs_mutex);
>>
>> error = 0;
>> - if ((sd->s_parent == new_parent_sd) &&
>> + if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
>> (strcmp(sd->s_name, new_name) == 0))
>> goto out; /* nothing to rename */
>>
>> error = -EEXIST;
>> - if (sysfs_find_dirent(new_parent_sd, new_name))
>> + if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
>> goto out;
>>
>> /* rename sysfs_dirent */
>> @@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd,
>> sd->s_parent = new_parent_sd;
>> sysfs_link_sibling(sd);
>> }
>> + sd->s_ns = new_ns;
>>
>> error = 0;
>> out:
>
> ...
>
>> +void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
>> +{
>> + struct super_block *sb;
>> +
>> + mutex_lock(&sysfs_mutex);
>> + spin_lock(&sb_lock);
>> + list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
>> + struct sysfs_super_info *info = sysfs_info(sb);
>> + /* Ignore superblocks that are in the process of unmounting */
>> + if (sb->s_count <= S_BIAS)
>> + continue;
>> + /* Ignore superblocks with the wrong ns */
>> + if (info->ns[type] != ns)
>> + continue;
>> + info->ns[type] = NULL;
>> + }
>> + spin_unlock(&sb_lock);
>> + mutex_unlock(&sysfs_mutex);
>> +}
>> +
>
> ..
>
>> @@ -136,6 +138,7 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>> const char *old, const char *new)
>> {
>> struct sysfs_dirent *parent_sd, *sd = NULL;
>> + const void *old_ns = NULL, *new_ns = NULL;
>> int result;
>>
>> if (!kobj)
>> @@ -143,8 +146,11 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>> else
>> parent_sd = kobj->sd;
>>
>> + if (targ->sd)
>> + old_ns = targ->sd->s_ns;
>> +
>> result = -ENOENT;
>> - sd = sysfs_get_dirent(parent_sd, old);
>> + sd = sysfs_get_dirent(parent_sd, old_ns, old);
>> if (!sd)
>> goto out;
>>
>> @@ -154,7 +160,10 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>> if (sd->s_symlink.target_sd->s_dir.kobj != targ)
>> goto out;
>>
>> - result = sysfs_rename(sd, parent_sd, new);
>> + if (sysfs_ns_type(parent_sd))
>> + new_ns = targ->ktype->namespace(targ);
>> +
>> + result = sysfs_rename(sd, parent_sd, new_ns, new);
>>
>> out:
>> sysfs_put(sd);
>
> This is a huge patch, and for the most part I haven't found any problems,
> except potentially this one. It looks like sysfs_rename_link() checks
> old_ns and new_ns before calling sysfs_rename(). But sysfs_mutex isn't
> taken until sysfs_rename(). sysfs_rename() will then proceed to do
> the rename, and unconditionally set sd->ns = new_ns.
>
> In the meantime, it seems as though new_ns might have exited, and
> sysfs_exit_ns() unset new_ns on the new parent dir. This means that
> we'll end up with the namespace code having thought that it cleared
> all new_ns's, but this file will have snuck by. Meaning an action on
> the renamed file might dereference a freed namespace.
>
> Or am I way off base?
There are a couple of reasons why this is not a concern.
The only new_ns we clear is on the super block.
sysfs itself never dereferences namespace arguments and only uses them
for comparison purposes. They are just cookies that cause comparisons
to differ from a sysfs perspective.
The upper levels are responsible for taking care of them selves
sysfs_mutex does not protect them. If you compile out sysfs the sysfs
mutex is not even present.
In the worst case if the upper levels mess up we will have a stale
token that we never dereference on a sysfs dirent, which in a pathological
case will happen to be the same as a new namespace and we will have
a spurious directory entry that we have leaked.
In practice we move all network devices (and thus sysfs files) out of
a network namespace before allowing it to exit. The network namespace
is not listed so it is invisible to anyone wanting to poke a network
device into an exiting network namespace. The unlisting of the
network namespace and the device_rename both happen under the
rtnl_lock which guarantees they are serialized.
Eric
^ permalink raw reply
* [PATCH] net_sched: minor netns related cleanup
From: Tom Goff @ 2010-03-31 2:43 UTC (permalink / raw)
To: David Miller; +Cc: Alexey Dobriyan, netdev
In-Reply-To: <20100326.201426.05866973.davem@davemloft.net>
These changes were suggested by Alexey Dobriyan <adobriyan@gmail.com>:
- psched_show() does not use any private data so just pass NULL to
psched_open()
- remove unnecessary return statement
Signed-off-by: Tom Goff <thomas.goff@boeing.com>
---
net/sched/sch_api.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 6d6fe16..c65866d 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1683,7 +1683,7 @@ static int psched_show(struct seq_file *seq, void *v)
static int psched_open(struct inode *inode, struct file *file)
{
- return single_open(file, psched_show, PDE(inode)->data);
+ return single_open(file, psched_show, NULL);
}
static const struct file_operations psched_fops = {
@@ -1708,8 +1708,6 @@ static int __net_init psched_net_init(struct net *net)
static void __net_exit psched_net_exit(struct net *net)
{
proc_net_remove(net, "psched");
-
- return;
}
#else
static int __net_init psched_net_init(struct net *net)
--
1.6.3.3
^ permalink raw reply related
* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Bryan Wu @ 2010-03-31 3:07 UTC (permalink / raw)
To: Wolfram Sang
Cc: netdev, s.hauer, linux-kernel, kernel-team, gerg,
linux-arm-kernel
In-Reply-To: <20100331030602.GD3520@pengutronix.de>
Wolfram Sang wrote:
> On Mon, Mar 29, 2010 at 04:40:09PM +0800, Bryan Wu wrote:
>
>> On 03/27/2010 08:57 PM, Wolfram Sang wrote:
>>
>>> On Fri, Mar 26, 2010 at 05:50:52PM +0800, Bryan Wu wrote:
>>>
>>>> BugLink: http://bugs.launchpad.net/bugs/457878
>>>>
>>>> - removed old MII phy control code
>>>> - add phylib supporting
>>>> - add ethtool interface to make user space NetworkManager works
>>>>
>>>> Tested on Freescale i.MX51 Babbage board.
>>>>
>>> Sadly, I have problems here booting a custom board:
>>>
>>>
>> Firstly, I working on our Ubuntu Lucid 2.6.31 based kernel. This patch works
>> fine on our system. Then I forward port it to 2.6.34-rc2 Linus mainline kernel.
>> It also works fine on my hardware.
>>
>
> Hmm, can I provide some more information to get an idea what is happening here?
>
>
No problem, man. I do love to help this.
-Bryan
^ permalink raw reply
* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Wolfram Sang @ 2010-03-31 3:06 UTC (permalink / raw)
To: Bryan Wu
Cc: amit.kucheria, netdev, s.hauer, linux-kernel, kernel-team, gerg,
linux-arm-kernel
In-Reply-To: <4BB06769.7040708@canonical.com>
[-- Attachment #1: Type: text/plain, Size: 931 bytes --]
On Mon, Mar 29, 2010 at 04:40:09PM +0800, Bryan Wu wrote:
> On 03/27/2010 08:57 PM, Wolfram Sang wrote:
>> On Fri, Mar 26, 2010 at 05:50:52PM +0800, Bryan Wu wrote:
>>> BugLink: http://bugs.launchpad.net/bugs/457878
>>>
>>> - removed old MII phy control code
>>> - add phylib supporting
>>> - add ethtool interface to make user space NetworkManager works
>>>
>>> Tested on Freescale i.MX51 Babbage board.
>>
>> Sadly, I have problems here booting a custom board:
>>
>
> Firstly, I working on our Ubuntu Lucid 2.6.31 based kernel. This patch works
> fine on our system. Then I forward port it to 2.6.34-rc2 Linus mainline kernel.
> It also works fine on my hardware.
Hmm, can I provide some more information to get an idea what is happening here?
--
Pengutronix e.K. | Wolfram Sang |
Industrial Linux Solutions | http://www.pengutronix.de/ |
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply
* Re: [PATCH] netdev/fec.c: add phylib supporting to enable carrier detection
From: Bryan Wu @ 2010-03-31 2:49 UTC (permalink / raw)
To: Bryan Wu
Cc: netdev, s.hauer, linux-kernel, w.sang, kernel-team, gerg,
linux-arm-kernel
In-Reply-To: <1269597052-10104-1-git-send-email-bryan.wu@canonical.com>
Sascha and Greg,
Could you please help to review and test this patch?
Thanks a lot,
-Bryan
On 03/26/2010 05:50 PM, Bryan Wu wrote:
> BugLink: http://bugs.launchpad.net/bugs/457878
>
> - removed old MII phy control code
> - add phylib supporting
> - add ethtool interface to make user space NetworkManager works
>
> Tested on Freescale i.MX51 Babbage board.
>
> This patch is based on a patch from Frederic Rodo<fred.rodo@gmail.com>
>
> Cc: Frederic Rodo<fred.rodo@gmail.com>
> Signed-off-by: Bryan Wu<bryan.wu@canonical.com>
> ---
> drivers/net/Kconfig | 1 +
> drivers/net/fec.c | 1125 ++++++++++++---------------------------------------
> 2 files changed, 253 insertions(+), 873 deletions(-)
>
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 0ba5b8e..41f6a70 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -1916,6 +1916,7 @@ config FEC
> bool "FEC ethernet controller (of ColdFire and some i.MX CPUs)"
> depends on M523x || M527x || M5272 || M528x || M520x || M532x || \
> MACH_MX27 || ARCH_MX35 || ARCH_MX25 || ARCH_MX5
> + select PHYLIB
> help
> Say Y here if you want to use the built-in 10/100 Fast ethernet
> controller on some Motorola ColdFire and Freescale i.MX processors.
> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
> index 9f98c1c..fca1f66 100644
> --- a/drivers/net/fec.c
> +++ b/drivers/net/fec.c
> @@ -40,6 +40,7 @@
> #include<linux/irq.h>
> #include<linux/clk.h>
> #include<linux/platform_device.h>
> +#include<linux/phy.h>
>
> #include<asm/cacheflush.h>
>
> @@ -61,7 +62,6 @@
> * Define the fixed address of the FEC hardware.
> */
> #if defined(CONFIG_M5272)
> -#define HAVE_mii_link_interrupt
>
> static unsigned char fec_mac_default[] = {
> 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
> @@ -86,23 +86,6 @@ static unsigned char fec_mac_default[] = {
> #endif
> #endif /* CONFIG_M5272 */
>
> -/* Forward declarations of some structures to support different PHYs */
> -
> -typedef struct {
> - uint mii_data;
> - void (*funct)(uint mii_reg, struct net_device *dev);
> -} phy_cmd_t;
> -
> -typedef struct {
> - uint id;
> - char *name;
> -
> - const phy_cmd_t *config;
> - const phy_cmd_t *startup;
> - const phy_cmd_t *ack_int;
> - const phy_cmd_t *shutdown;
> -} phy_info_t;
> -
> /* The number of Tx and Rx buffers. These are allocated from the page
> * pool. The code may assume these are power of two, so it it best
> * to keep them that size.
> @@ -189,29 +172,21 @@ struct fec_enet_private {
> uint tx_full;
> /* hold while accessing the HW like ringbuffer for tx/rx but not MAC */
> spinlock_t hw_lock;
> - /* hold while accessing the mii_list_t() elements */
> - spinlock_t mii_lock;
> -
> - uint phy_id;
> - uint phy_id_done;
> - uint phy_status;
> - uint phy_speed;
> - phy_info_t const *phy;
> - struct work_struct phy_task;
>
> - uint sequence_done;
> - uint mii_phy_task_queued;
> + struct platform_device *pdev;
>
> - uint phy_addr;
> + int opened;
>
> + /* Phylib and MDIO interface */
> + struct mii_bus *mii_bus;
> + struct phy_device *phy_dev;
> + int mii_timeout;
> + uint phy_speed;
> int index;
> - int opened;
> int link;
> - int old_link;
> int full_duplex;
> };
>
> -static void fec_enet_mii(struct net_device *dev);
> static irqreturn_t fec_enet_interrupt(int irq, void * dev_id);
> static void fec_enet_tx(struct net_device *dev);
> static void fec_enet_rx(struct net_device *dev);
> @@ -219,67 +194,20 @@ static int fec_enet_close(struct net_device *dev);
> static void fec_restart(struct net_device *dev, int duplex);
> static void fec_stop(struct net_device *dev);
>
> +/* FEC MII MMFR bits definition */
> +#define FEC_MMFR_ST (1<< 30)
> +#define FEC_MMFR_OP_READ (2<< 28)
> +#define FEC_MMFR_OP_WRITE (1<< 28)
> +#define FEC_MMFR_PA(v) ((v& 0x1f)<< 23)
> +#define FEC_MMFR_RA(v) ((v& 0x1f)<< 18)
> +#define FEC_MMFR_TA (2<< 16)
> +#define FEC_MMFR_DATA(v) (v& 0xffff)
>
> -/* MII processing. We keep this as simple as possible. Requests are
> - * placed on the list (if there is room). When the request is finished
> - * by the MII, an optional function may be called.
> - */
> -typedef struct mii_list {
> - uint mii_regval;
> - void (*mii_func)(uint val, struct net_device *dev);
> - struct mii_list *mii_next;
> -} mii_list_t;
> -
> -#define NMII 20
> -static mii_list_t mii_cmds[NMII];
> -static mii_list_t *mii_free;
> -static mii_list_t *mii_head;
> -static mii_list_t *mii_tail;
> -
> -static int mii_queue(struct net_device *dev, int request,
> - void (*func)(uint, struct net_device *));
> -
> -/* Make MII read/write commands for the FEC */
> -#define mk_mii_read(REG) (0x60020000 | ((REG& 0x1f)<< 18))
> -#define mk_mii_write(REG, VAL) (0x50020000 | ((REG& 0x1f)<< 18) | \
> - (VAL& 0xffff))
> -#define mk_mii_end 0
> +#define FEC_MII_TIMEOUT 10000
>
> /* Transmitter timeout */
> #define TX_TIMEOUT (2 * HZ)
>
> -/* Register definitions for the PHY */
> -
> -#define MII_REG_CR 0 /* Control Register */
> -#define MII_REG_SR 1 /* Status Register */
> -#define MII_REG_PHYIR1 2 /* PHY Identification Register 1 */
> -#define MII_REG_PHYIR2 3 /* PHY Identification Register 2 */
> -#define MII_REG_ANAR 4 /* A-N Advertisement Register */
> -#define MII_REG_ANLPAR 5 /* A-N Link Partner Ability Register */
> -#define MII_REG_ANER 6 /* A-N Expansion Register */
> -#define MII_REG_ANNPTR 7 /* A-N Next Page Transmit Register */
> -#define MII_REG_ANLPRNPR 8 /* A-N Link Partner Received Next Page Reg. */
> -
> -/* values for phy_status */
> -
> -#define PHY_CONF_ANE 0x0001 /* 1 auto-negotiation enabled */
> -#define PHY_CONF_LOOP 0x0002 /* 1 loopback mode enabled */
> -#define PHY_CONF_SPMASK 0x00f0 /* mask for speed */
> -#define PHY_CONF_10HDX 0x0010 /* 10 Mbit half duplex supported */
> -#define PHY_CONF_10FDX 0x0020 /* 10 Mbit full duplex supported */
> -#define PHY_CONF_100HDX 0x0040 /* 100 Mbit half duplex supported */
> -#define PHY_CONF_100FDX 0x0080 /* 100 Mbit full duplex supported */
> -
> -#define PHY_STAT_LINK 0x0100 /* 1 up - 0 down */
> -#define PHY_STAT_FAULT 0x0200 /* 1 remote fault */
> -#define PHY_STAT_ANC 0x0400 /* 1 auto-negotiation complete */
> -#define PHY_STAT_SPMASK 0xf000 /* mask for speed */
> -#define PHY_STAT_10HDX 0x1000 /* 10 Mbit half duplex selected */
> -#define PHY_STAT_10FDX 0x2000 /* 10 Mbit full duplex selected */
> -#define PHY_STAT_100HDX 0x4000 /* 100 Mbit half duplex selected */
> -#define PHY_STAT_100FDX 0x8000 /* 100 Mbit full duplex selected */
> -
> -
> static int
> fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> {
> @@ -406,12 +334,6 @@ fec_enet_interrupt(int irq, void * dev_id)
> ret = IRQ_HANDLED;
> fec_enet_tx(dev);
> }
> -
> - if (int_events& FEC_ENET_MII) {
> - ret = IRQ_HANDLED;
> - fec_enet_mii(dev);
> - }
> -
> } while (int_events);
>
> return ret;
> @@ -607,827 +529,312 @@ rx_processing_done:
> spin_unlock(&fep->hw_lock);
> }
>
> -/* called from interrupt context */
> -static void
> -fec_enet_mii(struct net_device *dev)
> -{
> - struct fec_enet_private *fep;
> - mii_list_t *mip;
> -
> - fep = netdev_priv(dev);
> - spin_lock(&fep->mii_lock);
> -
> - if ((mip = mii_head) == NULL) {
> - printk("MII and no head!\n");
> - goto unlock;
> - }
> -
> - if (mip->mii_func != NULL)
> - (*(mip->mii_func))(readl(fep->hwp + FEC_MII_DATA), dev);
> -
> - mii_head = mip->mii_next;
> - mip->mii_next = mii_free;
> - mii_free = mip;
> -
> - if ((mip = mii_head) != NULL)
> - writel(mip->mii_regval, fep->hwp + FEC_MII_DATA);
> -
> -unlock:
> - spin_unlock(&fep->mii_lock);
> -}
> -
> -static int
> -mii_queue_unlocked(struct net_device *dev, int regval,
> - void (*func)(uint, struct net_device *))
> +/* ------------------------------------------------------------------------- */
> +#ifdef CONFIG_M5272
> +static void __inline__ fec_get_mac(struct net_device *dev)
> {
> - struct fec_enet_private *fep;
> - mii_list_t *mip;
> - int retval;
> -
> - /* Add PHY address to register command */
> - fep = netdev_priv(dev);
> + struct fec_enet_private *fep = netdev_priv(dev);
> + unsigned char *iap, tmpaddr[ETH_ALEN];
>
> - regval |= fep->phy_addr<< 23;
> - retval = 0;
> -
> - if ((mip = mii_free) != NULL) {
> - mii_free = mip->mii_next;
> - mip->mii_regval = regval;
> - mip->mii_func = func;
> - mip->mii_next = NULL;
> - if (mii_head) {
> - mii_tail->mii_next = mip;
> - mii_tail = mip;
> - } else {
> - mii_head = mii_tail = mip;
> - writel(regval, fep->hwp + FEC_MII_DATA);
> - }
> + if (FEC_FLASHMAC) {
> + /*
> + * Get MAC address from FLASH.
> + * If it is all 1's or 0's, use the default.
> + */
> + iap = (unsigned char *)FEC_FLASHMAC;
> + if ((iap[0] == 0)&& (iap[1] == 0)&& (iap[2] == 0)&&
> + (iap[3] == 0)&& (iap[4] == 0)&& (iap[5] == 0))
> + iap = fec_mac_default;
> + if ((iap[0] == 0xff)&& (iap[1] == 0xff)&& (iap[2] == 0xff)&&
> + (iap[3] == 0xff)&& (iap[4] == 0xff)&& (iap[5] == 0xff))
> + iap = fec_mac_default;
> } else {
> - retval = 1;
> + *((unsigned long *)&tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> + *((unsigned short *)&tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH)>> 16);
> + iap =&tmpaddr[0];
> }
>
> - return retval;
> -}
> -
> -static int
> -mii_queue(struct net_device *dev, int regval,
> - void (*func)(uint, struct net_device *))
> -{
> - struct fec_enet_private *fep;
> - unsigned long flags;
> - int retval;
> - fep = netdev_priv(dev);
> - spin_lock_irqsave(&fep->mii_lock, flags);
> - retval = mii_queue_unlocked(dev, regval, func);
> - spin_unlock_irqrestore(&fep->mii_lock, flags);
> - return retval;
> -}
> -
> -static void mii_do_cmd(struct net_device *dev, const phy_cmd_t *c)
> -{
> - if(!c)
> - return;
> + memcpy(dev->dev_addr, iap, ETH_ALEN);
>
> - for (; c->mii_data != mk_mii_end; c++)
> - mii_queue(dev, c->mii_data, c->funct);
> + /* Adjust MAC if using default MAC address */
> + if (iap == fec_mac_default)
> + dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
> }
> +#endif
>
> -static void mii_parse_sr(uint mii_reg, struct net_device *dev)
> -{
> - struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> - uint status;
> -
> - status = *s& ~(PHY_STAT_LINK | PHY_STAT_FAULT | PHY_STAT_ANC);
> -
> - if (mii_reg& 0x0004)
> - status |= PHY_STAT_LINK;
> - if (mii_reg& 0x0010)
> - status |= PHY_STAT_FAULT;
> - if (mii_reg& 0x0020)
> - status |= PHY_STAT_ANC;
> - *s = status;
> -}
> +/* ------------------------------------------------------------------------- */
>
> -static void mii_parse_cr(uint mii_reg, struct net_device *dev)
> +/*
> + * Phy section
> + */
> +static void fec_enet_adjust_link(struct net_device *dev)
> {
> struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> - uint status;
> -
> - status = *s& ~(PHY_CONF_ANE | PHY_CONF_LOOP);
> -
> - if (mii_reg& 0x1000)
> - status |= PHY_CONF_ANE;
> - if (mii_reg& 0x4000)
> - status |= PHY_CONF_LOOP;
> - *s = status;
> -}
> + struct phy_device *phy_dev = fep->phy_dev;
> + unsigned long flags;
>
> -static void mii_parse_anar(uint mii_reg, struct net_device *dev)
> -{
> - struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> - uint status;
> -
> - status = *s& ~(PHY_CONF_SPMASK);
> -
> - if (mii_reg& 0x0020)
> - status |= PHY_CONF_10HDX;
> - if (mii_reg& 0x0040)
> - status |= PHY_CONF_10FDX;
> - if (mii_reg& 0x0080)
> - status |= PHY_CONF_100HDX;
> - if (mii_reg& 0x00100)
> - status |= PHY_CONF_100FDX;
> - *s = status;
> -}
> + int status_change = 0;
>
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT970 is used by many boards */
> + spin_lock_irqsave(&fep->hw_lock, flags);
>
> -#define MII_LXT970_MIRROR 16 /* Mirror register */
> -#define MII_LXT970_IER 17 /* Interrupt Enable Register */
> -#define MII_LXT970_ISR 18 /* Interrupt Status Register */
> -#define MII_LXT970_CONFIG 19 /* Configuration Register */
> -#define MII_LXT970_CSR 20 /* Chip Status Register */
> + /* Prevent a state halted on mii error */
> + if (fep->mii_timeout&& phy_dev->state == PHY_HALTED) {
> + phy_dev->state = PHY_RESUMING;
> + goto spin_unlock;
> + }
>
> -static void mii_parse_lxt970_csr(uint mii_reg, struct net_device *dev)
> -{
> - struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> - uint status;
> + /* Duplex link change */
> + if (phy_dev->link) {
> + if (fep->full_duplex != phy_dev->duplex) {
> + fec_restart(dev, phy_dev->duplex);
> + status_change = 1;
> + }
> + }
>
> - status = *s& ~(PHY_STAT_SPMASK);
> - if (mii_reg& 0x0800) {
> - if (mii_reg& 0x1000)
> - status |= PHY_STAT_100FDX;
> + /* Link on or off change */
> + if (phy_dev->link != fep->link) {
> + fep->link = phy_dev->link;
> + if (phy_dev->link)
> + fec_restart(dev, phy_dev->duplex);
> else
> - status |= PHY_STAT_100HDX;
> - } else {
> - if (mii_reg& 0x1000)
> - status |= PHY_STAT_10FDX;
> - else
> - status |= PHY_STAT_10HDX;
> + fec_stop(dev);
> + status_change = 1;
> }
> - *s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_lxt970_config[] = {
> - { mk_mii_read(MII_REG_CR), mii_parse_cr },
> - { mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_lxt970_startup[] = { /* enable interrupts */
> - { mk_mii_write(MII_LXT970_IER, 0x0002), NULL },
> - { mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_lxt970_ack_int[] = {
> - /* read SR and ISR to acknowledge */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_read(MII_LXT970_ISR), NULL },
> -
> - /* find out the current status */
> - { mk_mii_read(MII_LXT970_CSR), mii_parse_lxt970_csr },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_lxt970_shutdown[] = { /* disable interrupts */
> - { mk_mii_write(MII_LXT970_IER, 0x0000), NULL },
> - { mk_mii_end, }
> - };
> -static phy_info_t const phy_info_lxt970 = {
> - .id = 0x07810000,
> - .name = "LXT970",
> - .config = phy_cmd_lxt970_config,
> - .startup = phy_cmd_lxt970_startup,
> - .ack_int = phy_cmd_lxt970_ack_int,
> - .shutdown = phy_cmd_lxt970_shutdown
> -};
>
> -/* ------------------------------------------------------------------------- */
> -/* The Level one LXT971 is used on some of my custom boards */
> -
> -/* register definitions for the 971 */
> +spin_unlock:
> + spin_unlock_irqrestore(&fep->hw_lock, flags);
>
> -#define MII_LXT971_PCR 16 /* Port Control Register */
> -#define MII_LXT971_SR2 17 /* Status Register 2 */
> -#define MII_LXT971_IER 18 /* Interrupt Enable Register */
> -#define MII_LXT971_ISR 19 /* Interrupt Status Register */
> -#define MII_LXT971_LCR 20 /* LED Control Register */
> -#define MII_LXT971_TCR 30 /* Transmit Control Register */
> + if (status_change)
> + phy_print_status(phy_dev);
> +}
>
> /*
> - * I had some nice ideas of running the MDIO faster...
> - * The 971 should support 8MHz and I tried it, but things acted really
> - * weird, so 2.5 MHz ought to be enough for anyone...
> + * NOTE: a MII transaction is during around 25 us, so polling it...
> */
> -
> -static void mii_parse_lxt971_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
> {
> - struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> - uint status;
> + struct fec_enet_private *fep = bus->priv;
> + int timeout = FEC_MII_TIMEOUT;
>
> - status = *s& ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> + fep->mii_timeout = 0;
>
> - if (mii_reg& 0x0400) {
> - fep->link = 1;
> - status |= PHY_STAT_LINK;
> - } else {
> - fep->link = 0;
> - }
> - if (mii_reg& 0x0080)
> - status |= PHY_STAT_ANC;
> - if (mii_reg& 0x4000) {
> - if (mii_reg& 0x0200)
> - status |= PHY_STAT_100FDX;
> - else
> - status |= PHY_STAT_100HDX;
> - } else {
> - if (mii_reg& 0x0200)
> - status |= PHY_STAT_10FDX;
> - else
> - status |= PHY_STAT_10HDX;
> + /* clear MII end of transfer bit*/
> + writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
> +
> + /* start a read op */
> + writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA, fep->hwp + FEC_MII_DATA);
> +
> + /* wait for end of transfer */
> + while (!(readl(fep->hwp + FEC_IEVENT)& FEC_ENET_MII)) {
> + cpu_relax();
> + if (timeout--< 0) {
> + fep->mii_timeout = 1;
> + printk(KERN_ERR "FEC: MDIO read timeout\n");
> + return -ETIMEDOUT;
> + }
> }
> - if (mii_reg& 0x0008)
> - status |= PHY_STAT_FAULT;
>
> - *s = status;
> + /* return value */
> + return FEC_MMFR_DATA(readl(fep->hwp + FEC_MII_DATA));
> }
>
> -static phy_cmd_t const phy_cmd_lxt971_config[] = {
> - /* limit to 10MBit because my prototype board
> - * doesn't work with 100. */
> - { mk_mii_read(MII_REG_CR), mii_parse_cr },
> - { mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> - { mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_lxt971_startup[] = { /* enable interrupts */
> - { mk_mii_write(MII_LXT971_IER, 0x00f2), NULL },
> - { mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> - { mk_mii_write(MII_LXT971_LCR, 0xd422), NULL }, /* LED config */
> - /* Somehow does the 971 tell me that the link is down
> - * the first read after power-up.
> - * read here to get a valid value in ack_int */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_lxt971_ack_int[] = {
> - /* acknowledge the int before reading status ! */
> - { mk_mii_read(MII_LXT971_ISR), NULL },
> - /* find out the current status */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_read(MII_LXT971_SR2), mii_parse_lxt971_sr2 },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_lxt971_shutdown[] = { /* disable interrupts */
> - { mk_mii_write(MII_LXT971_IER, 0x0000), NULL },
> - { mk_mii_end, }
> - };
> -static phy_info_t const phy_info_lxt971 = {
> - .id = 0x0001378e,
> - .name = "LXT971",
> - .config = phy_cmd_lxt971_config,
> - .startup = phy_cmd_lxt971_startup,
> - .ack_int = phy_cmd_lxt971_ack_int,
> - .shutdown = phy_cmd_lxt971_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* The Quality Semiconductor QS6612 is used on the RPX CLLF */
> -
> -/* register definitions */
> -
> -#define MII_QS6612_MCR 17 /* Mode Control Register */
> -#define MII_QS6612_FTR 27 /* Factory Test Register */
> -#define MII_QS6612_MCO 28 /* Misc. Control Register */
> -#define MII_QS6612_ISR 29 /* Interrupt Source Register */
> -#define MII_QS6612_IMR 30 /* Interrupt Mask Register */
> -#define MII_QS6612_PCR 31 /* 100BaseTx PHY Control Reg. */
> -
> -static void mii_parse_qs6612_pcr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> + u16 value)
> {
> - struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> - uint status;
> + struct fec_enet_private *fep = bus->priv;
> + int timeout = FEC_MII_TIMEOUT;
>
> - status = *s& ~(PHY_STAT_SPMASK);
> + fep->mii_timeout = 0;
>
> - switch((mii_reg>> 2)& 7) {
> - case 1: status |= PHY_STAT_10HDX; break;
> - case 2: status |= PHY_STAT_100HDX; break;
> - case 5: status |= PHY_STAT_10FDX; break;
> - case 6: status |= PHY_STAT_100FDX; break;
> -}
> -
> - *s = status;
> -}
> -
> -static phy_cmd_t const phy_cmd_qs6612_config[] = {
> - /* The PHY powers up isolated on the RPX,
> - * so send a command to allow operation.
> - */
> - { mk_mii_write(MII_QS6612_PCR, 0x0dc0), NULL },
> -
> - /* parse cr and anar to get some info */
> - { mk_mii_read(MII_REG_CR), mii_parse_cr },
> - { mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_qs6612_startup[] = { /* enable interrupts */
> - { mk_mii_write(MII_QS6612_IMR, 0x003a), NULL },
> - { mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_qs6612_ack_int[] = {
> - /* we need to read ISR, SR and ANER to acknowledge */
> - { mk_mii_read(MII_QS6612_ISR), NULL },
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_read(MII_REG_ANER), NULL },
> -
> - /* read pcr to get info */
> - { mk_mii_read(MII_QS6612_PCR), mii_parse_qs6612_pcr },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_qs6612_shutdown[] = { /* disable interrupts */
> - { mk_mii_write(MII_QS6612_IMR, 0x0000), NULL },
> - { mk_mii_end, }
> - };
> -static phy_info_t const phy_info_qs6612 = {
> - .id = 0x00181440,
> - .name = "QS6612",
> - .config = phy_cmd_qs6612_config,
> - .startup = phy_cmd_qs6612_startup,
> - .ack_int = phy_cmd_qs6612_ack_int,
> - .shutdown = phy_cmd_qs6612_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* AMD AM79C874 phy */
> + /* clear MII end of transfer bit*/
> + writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>
> -/* register definitions for the 874 */
> + /* start a read op */
> + writel(FEC_MMFR_ST | FEC_MMFR_OP_READ |
> + FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
> + FEC_MMFR_TA | FEC_MMFR_DATA(value),
> + fep->hwp + FEC_MII_DATA);
> +
> + /* wait for end of transfer */
> + while (!(readl(fep->hwp + FEC_IEVENT)& FEC_ENET_MII)) {
> + cpu_relax();
> + if (timeout--< 0) {
> + fep->mii_timeout = 1;
> + printk(KERN_ERR "FEC: MDIO write timeout\n");
> + return -ETIMEDOUT;
> + }
> + }
>
> -#define MII_AM79C874_MFR 16 /* Miscellaneous Feature Register */
> -#define MII_AM79C874_ICSR 17 /* Interrupt/Status Register */
> -#define MII_AM79C874_DR 18 /* Diagnostic Register */
> -#define MII_AM79C874_PMLR 19 /* Power and Loopback Register */
> -#define MII_AM79C874_MCR 21 /* ModeControl Register */
> -#define MII_AM79C874_DC 23 /* Disconnect Counter */
> -#define MII_AM79C874_REC 24 /* Recieve Error Counter */
> + return 0;
> +}
>
> -static void mii_parse_am79c874_dr(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mdio_reset(struct mii_bus *bus)
> {
> - struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> - uint status;
> -
> - status = *s& ~(PHY_STAT_SPMASK | PHY_STAT_ANC);
> -
> - if (mii_reg& 0x0080)
> - status |= PHY_STAT_ANC;
> - if (mii_reg& 0x0400)
> - status |= ((mii_reg& 0x0800) ? PHY_STAT_100FDX : PHY_STAT_100HDX);
> - else
> - status |= ((mii_reg& 0x0800) ? PHY_STAT_10FDX : PHY_STAT_10HDX);
> -
> - *s = status;
> + return 0;
> }
>
> -static phy_cmd_t const phy_cmd_am79c874_config[] = {
> - { mk_mii_read(MII_REG_CR), mii_parse_cr },
> - { mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> - { mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_am79c874_startup[] = { /* enable interrupts */
> - { mk_mii_write(MII_AM79C874_ICSR, 0xff00), NULL },
> - { mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_am79c874_ack_int[] = {
> - /* find out the current status */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_read(MII_AM79C874_DR), mii_parse_am79c874_dr },
> - /* we only need to read ISR to acknowledge */
> - { mk_mii_read(MII_AM79C874_ICSR), NULL },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_am79c874_shutdown[] = { /* disable interrupts */
> - { mk_mii_write(MII_AM79C874_ICSR, 0x0000), NULL },
> - { mk_mii_end, }
> - };
> -static phy_info_t const phy_info_am79c874 = {
> - .id = 0x00022561,
> - .name = "AM79C874",
> - .config = phy_cmd_am79c874_config,
> - .startup = phy_cmd_am79c874_startup,
> - .ack_int = phy_cmd_am79c874_ack_int,
> - .shutdown = phy_cmd_am79c874_shutdown
> -};
> -
> -
> -/* ------------------------------------------------------------------------- */
> -/* Kendin KS8721BL phy */
> -
> -/* register definitions for the 8721 */
> -
> -#define MII_KS8721BL_RXERCR 21
> -#define MII_KS8721BL_ICSR 27
> -#define MII_KS8721BL_PHYCR 31
> -
> -static phy_cmd_t const phy_cmd_ks8721bl_config[] = {
> - { mk_mii_read(MII_REG_CR), mii_parse_cr },
> - { mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_ks8721bl_startup[] = { /* enable interrupts */
> - { mk_mii_write(MII_KS8721BL_ICSR, 0xff00), NULL },
> - { mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_ks8721bl_ack_int[] = {
> - /* find out the current status */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - /* we only need to read ISR to acknowledge */
> - { mk_mii_read(MII_KS8721BL_ICSR), NULL },
> - { mk_mii_end, }
> - };
> -static phy_cmd_t const phy_cmd_ks8721bl_shutdown[] = { /* disable interrupts */
> - { mk_mii_write(MII_KS8721BL_ICSR, 0x0000), NULL },
> - { mk_mii_end, }
> - };
> -static phy_info_t const phy_info_ks8721bl = {
> - .id = 0x00022161,
> - .name = "KS8721BL",
> - .config = phy_cmd_ks8721bl_config,
> - .startup = phy_cmd_ks8721bl_startup,
> - .ack_int = phy_cmd_ks8721bl_ack_int,
> - .shutdown = phy_cmd_ks8721bl_shutdown
> -};
> -
> -/* ------------------------------------------------------------------------- */
> -/* register definitions for the DP83848 */
> -
> -#define MII_DP8384X_PHYSTST 16 /* PHY Status Register */
> -
> -static void mii_parse_dp8384x_sr2(uint mii_reg, struct net_device *dev)
> +static int fec_enet_mii_probe(struct net_device *dev)
> {
> struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> -
> - *s&= ~(PHY_STAT_SPMASK | PHY_STAT_LINK | PHY_STAT_ANC);
> -
> - /* Link up */
> - if (mii_reg& 0x0001) {
> - fep->link = 1;
> - *s |= PHY_STAT_LINK;
> - } else
> - fep->link = 0;
> - /* Status of link */
> - if (mii_reg& 0x0010) /* Autonegotioation complete */
> - *s |= PHY_STAT_ANC;
> - if (mii_reg& 0x0002) { /* 10MBps? */
> - if (mii_reg& 0x0004) /* Full Duplex? */
> - *s |= PHY_STAT_10FDX;
> - else
> - *s |= PHY_STAT_10HDX;
> - } else { /* 100 Mbps? */
> - if (mii_reg& 0x0004) /* Full Duplex? */
> - *s |= PHY_STAT_100FDX;
> - else
> - *s |= PHY_STAT_100HDX;
> - }
> - if (mii_reg& 0x0008)
> - *s |= PHY_STAT_FAULT;
> -}
> -
> -static phy_info_t phy_info_dp83848= {
> - 0x020005c9,
> - "DP83848",
> + struct phy_device *phy_dev = NULL;
> + int phy_addr;
>
> - (const phy_cmd_t []) { /* config */
> - { mk_mii_read(MII_REG_CR), mii_parse_cr },
> - { mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> - { mk_mii_read(MII_DP8384X_PHYSTST), mii_parse_dp8384x_sr2 },
> - { mk_mii_end, }
> - },
> - (const phy_cmd_t []) { /* startup - enable interrupts */
> - { mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_end, }
> - },
> - (const phy_cmd_t []) { /* ack_int - never happens, no interrupt */
> - { mk_mii_end, }
> - },
> - (const phy_cmd_t []) { /* shutdown */
> - { mk_mii_end, }
> - },
> -};
> + /* find the first phy */
> + for (phy_addr = 0; phy_addr< PHY_MAX_ADDR; phy_addr++) {
> + if (fep->mii_bus->phy_map[phy_addr]) {
> + phy_dev = fep->mii_bus->phy_map[phy_addr];
> + break;
> + }
> + }
>
> -static phy_info_t phy_info_lan8700 = {
> - 0x0007C0C,
> - "LAN8700",
> - (const phy_cmd_t []) { /* config */
> - { mk_mii_read(MII_REG_CR), mii_parse_cr },
> - { mk_mii_read(MII_REG_ANAR), mii_parse_anar },
> - { mk_mii_end, }
> - },
> - (const phy_cmd_t []) { /* startup */
> - { mk_mii_write(MII_REG_CR, 0x1200), NULL }, /* autonegotiate */
> - { mk_mii_read(MII_REG_SR), mii_parse_sr },
> - { mk_mii_end, }
> - },
> - (const phy_cmd_t []) { /* act_int */
> - { mk_mii_end, }
> - },
> - (const phy_cmd_t []) { /* shutdown */
> - { mk_mii_end, }
> - },
> -};
> -/* ------------------------------------------------------------------------- */
> + if (!phy_dev) {
> + printk(KERN_ERR "%s: no PHY found\n", dev->name);
> + return -ENODEV;
> + }
>
> -static phy_info_t const * const phy_info[] = {
> - &phy_info_lxt970,
> - &phy_info_lxt971,
> - &phy_info_qs6612,
> - &phy_info_am79c874,
> - &phy_info_ks8721bl,
> - &phy_info_dp83848,
> - &phy_info_lan8700,
> - NULL
> -};
> + /* attach the mac to the phy */
> + phy_dev = phy_connect(dev, dev_name(&phy_dev->dev),
> + &fec_enet_adjust_link, 0,
> + PHY_INTERFACE_MODE_MII);
> + if (IS_ERR(phy_dev)) {
> + printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
> + return PTR_ERR(phy_dev);
> + }
>
> -/* ------------------------------------------------------------------------- */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id);
> + /* mask with MAC supported features */
> + phy_dev->supported&= PHY_BASIC_FEATURES;
> + phy_dev->advertising = phy_dev->supported;
>
> -/*
> - * This is specific to the MII interrupt setup of the M5272EVB.
> - */
> -static void __inline__ fec_request_mii_intr(struct net_device *dev)
> -{
> - if (request_irq(66, mii_link_interrupt, IRQF_DISABLED, "fec(MII)", dev) != 0)
> - printk("FEC: Could not allocate fec(MII) IRQ(66)!\n");
> -}
> + fep->phy_dev = phy_dev;
> + fep->link = 0;
> + fep->full_duplex = 0;
>
> -static void __inline__ fec_disable_phy_intr(struct net_device *dev)
> -{
> - free_irq(66, dev);
> + return 0;
> }
> -#endif
>
> -#ifdef CONFIG_M5272
> -static void __inline__ fec_get_mac(struct net_device *dev)
> +static int fec_enet_mii_init(struct platform_device *pdev)
> {
> + struct net_device *dev = platform_get_drvdata(pdev);
> struct fec_enet_private *fep = netdev_priv(dev);
> - unsigned char *iap, tmpaddr[ETH_ALEN];
> + int err = -ENXIO, i;
>
> - if (FEC_FLASHMAC) {
> - /*
> - * Get MAC address from FLASH.
> - * If it is all 1's or 0's, use the default.
> - */
> - iap = (unsigned char *)FEC_FLASHMAC;
> - if ((iap[0] == 0)&& (iap[1] == 0)&& (iap[2] == 0)&&
> - (iap[3] == 0)&& (iap[4] == 0)&& (iap[5] == 0))
> - iap = fec_mac_default;
> - if ((iap[0] == 0xff)&& (iap[1] == 0xff)&& (iap[2] == 0xff)&&
> - (iap[3] == 0xff)&& (iap[4] == 0xff)&& (iap[5] == 0xff))
> - iap = fec_mac_default;
> - } else {
> - *((unsigned long *)&tmpaddr[0]) = readl(fep->hwp + FEC_ADDR_LOW);
> - *((unsigned short *)&tmpaddr[4]) = (readl(fep->hwp + FEC_ADDR_HIGH)>> 16);
> - iap =&tmpaddr[0];
> - }
> -
> - memcpy(dev->dev_addr, iap, ETH_ALEN);
> -
> - /* Adjust MAC if using default MAC address */
> - if (iap == fec_mac_default)
> - dev->dev_addr[ETH_ALEN-1] = fec_mac_default[ETH_ALEN-1] + fep->index;
> -}
> -#endif
> + fep->mii_timeout = 0;
>
> -/* ------------------------------------------------------------------------- */
> -
> -static void mii_display_status(struct net_device *dev)
> -{
> - struct fec_enet_private *fep = netdev_priv(dev);
> - volatile uint *s =&(fep->phy_status);
> + /*
> + * Set MII speed to 2.5 MHz
> + */
> + fep->phy_speed = ((((clk_get_rate(fep->clk) / 2 + 4999999)
> + / 2500000) / 2)& 0x3F)<< 1;
> + writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
>
> - if (!fep->link&& !fep->old_link) {
> - /* Link is still down - don't print anything */
> - return;
> + fep->mii_bus = mdiobus_alloc();
> + if (fep->mii_bus == NULL) {
> + err = -ENOMEM;
> + goto err_out;
> }
>
> - printk("%s: status: ", dev->name);
> -
> - if (!fep->link) {
> - printk("link down");
> - } else {
> - printk("link up");
> -
> - switch(*s& PHY_STAT_SPMASK) {
> - case PHY_STAT_100FDX: printk(", 100MBit Full Duplex"); break;
> - case PHY_STAT_100HDX: printk(", 100MBit Half Duplex"); break;
> - case PHY_STAT_10FDX: printk(", 10MBit Full Duplex"); break;
> - case PHY_STAT_10HDX: printk(", 10MBit Half Duplex"); break;
> - default:
> - printk(", Unknown speed/duplex");
> - }
> -
> - if (*s& PHY_STAT_ANC)
> - printk(", auto-negotiation complete");
> + fep->mii_bus->name = "fec_enet_mii_bus";
> + fep->mii_bus->read = fec_enet_mdio_read;
> + fep->mii_bus->write = fec_enet_mdio_write;
> + fep->mii_bus->reset = fec_enet_mdio_reset;
> + snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id);
> + fep->mii_bus->priv = fep;
> + fep->mii_bus->parent =&pdev->dev;
> +
> + fep->mii_bus->irq = kmalloc(sizeof(int) * PHY_MAX_ADDR, GFP_KERNEL);
> + if (!fep->mii_bus->irq) {
> + err = -ENOMEM;
> + goto err_out_free_mdiobus;
> }
>
> - if (*s& PHY_STAT_FAULT)
> - printk(", remote fault");
> -
> - printk(".\n");
> -}
> -
> -static void mii_display_config(struct work_struct *work)
> -{
> - struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> - struct net_device *dev = fep->netdev;
> - uint status = fep->phy_status;
> + for (i = 0; i< PHY_MAX_ADDR; i++)
> + fep->mii_bus->irq[i] = PHY_POLL;
>
> - /*
> - ** When we get here, phy_task is already removed from
> - ** the workqueue. It is thus safe to allow to reuse it.
> - */
> - fep->mii_phy_task_queued = 0;
> - printk("%s: config: auto-negotiation ", dev->name);
> -
> - if (status& PHY_CONF_ANE)
> - printk("on");
> - else
> - printk("off");
> + platform_set_drvdata(dev, fep->mii_bus);
>
> - if (status& PHY_CONF_100FDX)
> - printk(", 100FDX");
> - if (status& PHY_CONF_100HDX)
> - printk(", 100HDX");
> - if (status& PHY_CONF_10FDX)
> - printk(", 10FDX");
> - if (status& PHY_CONF_10HDX)
> - printk(", 10HDX");
> - if (!(status& PHY_CONF_SPMASK))
> - printk(", No speed/duplex selected?");
> + if (mdiobus_register(fep->mii_bus))
> + goto err_out_free_mdio_irq;
>
> - if (status& PHY_CONF_LOOP)
> - printk(", loopback enabled");
> + if (fec_enet_mii_probe(dev) != 0)
> + goto err_out_unregister_bus;
>
> - printk(".\n");
> + return 0;
>
> - fep->sequence_done = 1;
> +err_out_unregister_bus:
> + mdiobus_unregister(fep->mii_bus);
> +err_out_free_mdio_irq:
> + kfree(fep->mii_bus->irq);
> +err_out_free_mdiobus:
> + mdiobus_free(fep->mii_bus);
> +err_out:
> + return err;
> }
>
> -static void mii_relink(struct work_struct *work)
> +static void fec_enet_mii_remove(struct fec_enet_private *fep)
> {
> - struct fec_enet_private *fep = container_of(work, struct fec_enet_private, phy_task);
> - struct net_device *dev = fep->netdev;
> - int duplex;
> -
> - /*
> - ** When we get here, phy_task is already removed from
> - ** the workqueue. It is thus safe to allow to reuse it.
> - */
> - fep->mii_phy_task_queued = 0;
> - fep->link = (fep->phy_status& PHY_STAT_LINK) ? 1 : 0;
> - mii_display_status(dev);
> - fep->old_link = fep->link;
> -
> - if (fep->link) {
> - duplex = 0;
> - if (fep->phy_status
> - & (PHY_STAT_100FDX | PHY_STAT_10FDX))
> - duplex = 1;
> - fec_restart(dev, duplex);
> - } else
> - fec_stop(dev);
> + if (fep->phy_dev)
> + phy_disconnect(fep->phy_dev);
> + mdiobus_unregister(fep->mii_bus);
> + kfree(fep->mii_bus->irq);
> + mdiobus_free(fep->mii_bus);
> }
>
> -/* mii_queue_relink is called in interrupt context from mii_link_interrupt */
> -static void mii_queue_relink(uint mii_reg, struct net_device *dev)
> +static int fec_enet_get_settings(struct net_device *dev,
> + struct ethtool_cmd *cmd)
> {
> struct fec_enet_private *fep = netdev_priv(dev);
> + struct phy_device *phydev = fep->phy_dev;
>
> - /*
> - * We cannot queue phy_task twice in the workqueue. It
> - * would cause an endless loop in the workqueue.
> - * Fortunately, if the last mii_relink entry has not yet been
> - * executed now, it will do the job for the current interrupt,
> - * which is just what we want.
> - */
> - if (fep->mii_phy_task_queued)
> - return;
> + if (!phydev)
> + return -ENODEV;
>
> - fep->mii_phy_task_queued = 1;
> - INIT_WORK(&fep->phy_task, mii_relink);
> - schedule_work(&fep->phy_task);
> + return phy_ethtool_gset(phydev, cmd);
> }
>
> -/* mii_queue_config is called in interrupt context from fec_enet_mii */
> -static void mii_queue_config(uint mii_reg, struct net_device *dev)
> +static int fec_enet_set_settings(struct net_device *dev,
> + struct ethtool_cmd *cmd)
> {
> struct fec_enet_private *fep = netdev_priv(dev);
> + struct phy_device *phydev = fep->phy_dev;
>
> - if (fep->mii_phy_task_queued)
> - return;
> + if (!phydev)
> + return -ENODEV;
>
> - fep->mii_phy_task_queued = 1;
> - INIT_WORK(&fep->phy_task, mii_display_config);
> - schedule_work(&fep->phy_task);
> + return phy_ethtool_sset(phydev, cmd);
> }
>
> -phy_cmd_t const phy_cmd_relink[] = {
> - { mk_mii_read(MII_REG_CR), mii_queue_relink },
> - { mk_mii_end, }
> - };
> -phy_cmd_t const phy_cmd_config[] = {
> - { mk_mii_read(MII_REG_CR), mii_queue_config },
> - { mk_mii_end, }
> - };
> -
> -/* Read remainder of PHY ID. */
> -static void
> -mii_discover_phy3(uint mii_reg, struct net_device *dev)
> +static void fec_enet_get_drvinfo(struct net_device *dev,
> + struct ethtool_drvinfo *info)
> {
> - struct fec_enet_private *fep;
> - int i;
> -
> - fep = netdev_priv(dev);
> - fep->phy_id |= (mii_reg& 0xffff);
> - printk("fec: PHY @ 0x%x, ID 0x%08x", fep->phy_addr, fep->phy_id);
> -
> - for(i = 0; phy_info[i]; i++) {
> - if(phy_info[i]->id == (fep->phy_id>> 4))
> - break;
> - }
> -
> - if (phy_info[i])
> - printk(" -- %s\n", phy_info[i]->name);
> - else
> - printk(" -- unknown PHY!\n");
> + struct fec_enet_private *fep = netdev_priv(dev);
>
> - fep->phy = phy_info[i];
> - fep->phy_id_done = 1;
> + strcpy(info->driver, fep->pdev->dev.driver->name);
> + strcpy(info->version, "Revision: 1.0");
> + strcpy(info->bus_info, dev_name(&dev->dev));
> }
>
> -/* Scan all of the MII PHY addresses looking for someone to respond
> - * with a valid ID. This usually happens quickly.
> - */
> -static void
> -mii_discover_phy(uint mii_reg, struct net_device *dev)
> -{
> - struct fec_enet_private *fep;
> - uint phytype;
> -
> - fep = netdev_priv(dev);
> -
> - if (fep->phy_addr< 32) {
> - if ((phytype = (mii_reg& 0xffff)) != 0xffff&& phytype != 0) {
> -
> - /* Got first part of ID, now get remainder */
> - fep->phy_id = phytype<< 16;
> - mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR2),
> - mii_discover_phy3);
> - } else {
> - fep->phy_addr++;
> - mii_queue_unlocked(dev, mk_mii_read(MII_REG_PHYIR1),
> - mii_discover_phy);
> - }
> - } else {
> - printk("FEC: No PHY device found.\n");
> - /* Disable external MII interface */
> - writel(0, fep->hwp + FEC_MII_SPEED);
> - fep->phy_speed = 0;
> -#ifdef HAVE_mii_link_interrupt
> - fec_disable_phy_intr(dev);
> -#endif
> - }
> -}
> +static struct ethtool_ops fec_enet_ethtool_ops = {
> + .get_settings = fec_enet_get_settings,
> + .set_settings = fec_enet_set_settings,
> + .get_drvinfo = fec_enet_get_drvinfo,
> + .get_link = ethtool_op_get_link,
> +};
>
> -/* This interrupt occurs when the PHY detects a link change */
> -#ifdef HAVE_mii_link_interrupt
> -static irqreturn_t
> -mii_link_interrupt(int irq, void * dev_id)
> +static int fec_enet_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
> {
> - struct net_device *dev = dev_id;
> struct fec_enet_private *fep = netdev_priv(dev);
> + struct phy_device *phydev = fep->phy_dev;
>
> - mii_do_cmd(dev, fep->phy->ack_int);
> - mii_do_cmd(dev, phy_cmd_relink); /* restart and display status */
> + if (!netif_running(dev))
> + return -EINVAL;
>
> - return IRQ_HANDLED;
> + if (!phydev)
> + return -ENODEV;
> +
> + return phy_mii_ioctl(phydev, if_mii(rq), cmd);
> }
> -#endif
>
> static void fec_enet_free_buffers(struct net_device *dev)
> {
> @@ -1509,35 +916,8 @@ fec_enet_open(struct net_device *dev)
> if (ret)
> return ret;
>
> - fep->sequence_done = 0;
> - fep->link = 0;
> -
> - fec_restart(dev, 1);
> -
> - if (fep->phy) {
> - mii_do_cmd(dev, fep->phy->ack_int);
> - mii_do_cmd(dev, fep->phy->config);
> - mii_do_cmd(dev, phy_cmd_config); /* display configuration */
> -
> - /* Poll until the PHY tells us its configuration
> - * (not link state).
> - * Request is initiated by mii_do_cmd above, but answer
> - * comes by interrupt.
> - * This should take about 25 usec per register at 2.5 MHz,
> - * and we read approximately 5 registers.
> - */
> - while(!fep->sequence_done)
> - schedule();
> -
> - mii_do_cmd(dev, fep->phy->startup);
> - }
> -
> - /* Set the initial link state to true. A lot of hardware
> - * based on this device does not implement a PHY interrupt,
> - * so we are never notified of link change.
> - */
> - fep->link = 1;
> -
> + /* schedule a link state check */
> + phy_start(fep->phy_dev);
> netif_start_queue(dev);
> fep->opened = 1;
> return 0;
> @@ -1550,6 +930,7 @@ fec_enet_close(struct net_device *dev)
>
> /* Don't know what to do yet. */
> fep->opened = 0;
> + phy_stop(fep->phy_dev);
> netif_stop_queue(dev);
> fec_stop(dev);
>
> @@ -1666,6 +1047,7 @@ static const struct net_device_ops fec_netdev_ops = {
> .ndo_validate_addr = eth_validate_addr,
> .ndo_tx_timeout = fec_timeout,
> .ndo_set_mac_address = fec_set_mac_address,
> + .ndo_do_ioctl = fec_enet_ioctl,
> };
>
> /*
> @@ -1689,7 +1071,6 @@ static int fec_enet_init(struct net_device *dev, int index)
> }
>
> spin_lock_init(&fep->hw_lock);
> - spin_lock_init(&fep->mii_lock);
>
> fep->index = index;
> fep->hwp = (void __iomem *)dev->base_addr;
> @@ -1716,16 +1097,10 @@ static int fec_enet_init(struct net_device *dev, int index)
> fep->rx_bd_base = cbd_base;
> fep->tx_bd_base = cbd_base + RX_RING_SIZE;
>
> -#ifdef HAVE_mii_link_interrupt
> - fec_request_mii_intr(dev);
> -#endif
> /* The FEC Ethernet specific entries in the device structure */
> dev->watchdog_timeo = TX_TIMEOUT;
> dev->netdev_ops =&fec_netdev_ops;
> -
> - for (i=0; i<NMII-1; i++)
> - mii_cmds[i].mii_next =&mii_cmds[i+1];
> - mii_free = mii_cmds;
> + dev->ethtool_ops =&fec_enet_ethtool_ops;
>
> /* Set MII speed to 2.5 MHz */
> fep->phy_speed = ((((clk_get_rate(fep->clk) / 2 + 4999999)
> @@ -1760,13 +1135,6 @@ static int fec_enet_init(struct net_device *dev, int index)
>
> fec_restart(dev, 0);
>
> - /* Queue up command to detect the PHY and initialize the
> - * remainder of the interface.
> - */
> - fep->phy_id_done = 0;
> - fep->phy_addr = 0;
> - mii_queue(dev, mk_mii_read(MII_REG_PHYIR1), mii_discover_phy);
> -
> return 0;
> }
>
> @@ -1835,8 +1203,7 @@ fec_restart(struct net_device *dev, int duplex)
> writel(0, fep->hwp + FEC_R_DES_ACTIVE);
>
> /* Enable interrupts we wish to service */
> - writel(FEC_ENET_TXF | FEC_ENET_RXF | FEC_ENET_MII,
> - fep->hwp + FEC_IMASK);
> + writel(FEC_ENET_TXF | FEC_ENET_RXF, fep->hwp + FEC_IMASK);
> }
>
> static void
> @@ -1859,7 +1226,6 @@ fec_stop(struct net_device *dev)
> /* Clear outstanding MII command interrupts. */
> writel(FEC_ENET_MII, fep->hwp + FEC_IEVENT);
>
> - writel(FEC_ENET_MII, fep->hwp + FEC_IMASK);
> writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
> }
>
> @@ -1891,6 +1257,7 @@ fec_probe(struct platform_device *pdev)
> memset(fep, 0, sizeof(*fep));
>
> ndev->base_addr = (unsigned long)ioremap(r->start, resource_size(r));
> + fep->pdev = pdev;
>
> if (!ndev->base_addr) {
> ret = -ENOMEM;
> @@ -1926,13 +1293,24 @@ fec_probe(struct platform_device *pdev)
> if (ret)
> goto failed_init;
>
> + ret = fec_enet_mii_init(pdev);
> + if (ret)
> + goto failed_mii_init;
> +
> ret = register_netdev(ndev);
> if (ret)
> goto failed_register;
>
> + printk(KERN_INFO "%s: Freescale FEC PHY driver [%s] "
> + "(mii_bus:phy_addr=%s, irq=%d)\n", ndev->name,
> + fep->phy_dev->drv->name, dev_name(&fep->phy_dev->dev),
> + fep->phy_dev->irq);
> +
> return 0;
>
> failed_register:
> + fec_enet_mii_remove(fep);
> +failed_mii_init:
> failed_init:
> clk_disable(fep->clk);
> clk_put(fep->clk);
> @@ -1959,6 +1337,7 @@ fec_drv_remove(struct platform_device *pdev)
> platform_set_drvdata(pdev, NULL);
>
> fec_stop(ndev);
> + fec_enet_mii_remove(fep);
> clk_disable(fep->clk);
> clk_put(fep->clk);
> iounmap((void __iomem *)ndev->base_addr);
--
Bryan Wu <bryan.wu@canonical.com>
Kernel Developer +86.138-1617-6545 Mobile
Ubuntu Kernel Team | Hardware Enablement Team
Canonical Ltd. www.canonical.com
Ubuntu - Linux for human beings | www.ubuntu.com
^ permalink raw reply
* Re: [PATCH 0/3] sky2 minor driver updates
From: David Miller @ 2010-03-31 2:45 UTC (permalink / raw)
To: shemminger; +Cc: netdev
In-Reply-To: <20100329173617.765470658@vyatta.com>
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 29 Mar 2010 10:36:17 -0700
> These are minor changes related to chip versions as described
> in current Marvell driver.
Applied to net-next-2.6, thanks.
^ permalink raw reply
* Re: [PATCH] net_sched: minor netns related cleanup
From: David Miller @ 2010-03-31 2:45 UTC (permalink / raw)
To: thomas.goff; +Cc: adobriyan, netdev
In-Reply-To: <20100331024354.GA6631@boeing.com>
From: Tom Goff <thomas.goff@boeing.com>
Date: Tue, 30 Mar 2010 19:43:54 -0700
> These changes were suggested by Alexey Dobriyan <adobriyan@gmail.com>:
>
> - psched_show() does not use any private data so just pass NULL to
> psched_open()
>
> - remove unnecessary return statement
>
> Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Applied, thanks.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox