* [PATCH net-next] inet: add rfc 3168 extract in front of INET_ECN_encapsulate()
From: Eric Dumazet @ 2011-10-22 5:11 UTC (permalink / raw)
To: David Miller; +Cc: netdev
INET_ECN_encapsulate() is better understood if we can read the official
statement.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/net/inet_ecn.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index 2fa8d13..2fa1469 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -30,6 +30,14 @@ static inline int INET_ECN_is_capable(__u8 dsfield)
return dsfield & INET_ECN_ECT_0;
}
+/*
+ * RFC 3168 9.1.1
+ * The full-functionality option for ECN encapsulation is to copy the
+ * ECN codepoint of the inside header to the outside header on
+ * encapsulation if the inside header is not-ECT or ECT, and to set the
+ * ECN codepoint of the outside header to ECT(0) if the ECN codepoint of
+ * the inside header is CE.
+ */
static inline __u8 INET_ECN_encapsulate(__u8 outer, __u8 inner)
{
outer &= ~INET_ECN_MASK;
^ permalink raw reply related
* Re: [PATCH net-next] inet: add rfc 3168 extract in front of INET_ECN_encapsulate()
From: David Miller @ 2011-10-22 5:26 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1319260268.6180.12.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 22 Oct 2011 07:11:08 +0200
> INET_ECN_encapsulate() is better understood if we can read the official
> statement.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
^ permalink raw reply
* [RFC v2 PATCH 0/4] Support sending gratuitous by guest
From: Jason Wang @ 2011-10-22 5:38 UTC (permalink / raw)
To: aliguori, quintela, jan.kiszka, mst, qemu-devel, blauwirbel
Cc: pbonzini, rusty, kvm, netdev
We only track primary mac address in qemu and send rarp packets after
migration to notify the switch to update its mac address table. This
may not works when guest have complicated network configurations such
as tagged vlan or ipv6, those connection may lost or stall after
migration.
One method to handle them is snooping the network traffic in qemu and
recording use of mac, but this method would hurt performance and is
impossible for network backend such as vhost.
So in order to solve this issue, the best method is to let guest
instead of qemu to send gratuitous packet. This series first add a
model specific fucntion which can let nic model to implement its own
announce function and then implement a virtio-net specific function to
let guest send the gratitous packet.
Only basic test were done.
Comments are welcomed.
Thanks
---
Jason Wang (4):
announce self after vm start
net: export announce_self_create()
net: model specific announcing support
virtio-net: notify guest to annouce itself
hw/virtio-net.c | 20 +++++++++++++++++++-
hw/virtio-net.h | 2 ++
migration.c | 1 -
net.c | 31 +++++++++++++++++++++++++++++++
net.h | 3 +++
savevm.c | 40 +++++-----------------------------------
vl.c | 1 +
7 files changed, 61 insertions(+), 37 deletions(-)
--
Jason Wang
^ permalink raw reply
* [RFC v2 PATCH 1/4] announce self after vm start
From: Jason Wang @ 2011-10-22 5:38 UTC (permalink / raw)
To: aliguori, quintela, jan.kiszka, mst, qemu-devel, blauwirbel
Cc: pbonzini, rusty, kvm, netdev
In-Reply-To: <20111022053540.21526.61249.stgit@dhcp-8-146.nay.redhat.com>
We send gratituous packets to let switch to update its mac address
table, this is only done after migration currently because guest may
move to the host with another port connect to switch.
Unfortunately this kind of notification is also needed for continue a
stopped vm as the mac address table entry may not existed because of
aging. This patch solve this by call qemu_announce_self() in
vm_start() instead of in process_incoming_migration(). Through this,
gratituous packets were sent each time when vm starts.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
migration.c | 1 -
vl.c | 1 +
2 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/migration.c b/migration.c
index 77a51ad..3326b02 100644
--- a/migration.c
+++ b/migration.c
@@ -67,7 +67,6 @@ void process_incoming_migration(QEMUFile *f)
fprintf(stderr, "load of migration failed\n");
exit(0);
}
- qemu_announce_self();
DPRINTF("successfully loaded vm state\n");
if (autostart) {
diff --git a/vl.c b/vl.c
index dbf7778..e4408e0 100644
--- a/vl.c
+++ b/vl.c
@@ -1262,6 +1262,7 @@ void vm_start(void)
vm_state_notify(1, RUN_STATE_RUNNING);
resume_all_vcpus();
monitor_protocol_event(QEVENT_RESUME, NULL);
+ qemu_announce_self();
}
}
^ permalink raw reply related
* [RFC v2 PATCH 2/4] net: export announce_self_create()
From: Jason Wang @ 2011-10-22 5:38 UTC (permalink / raw)
To: aliguori, quintela, jan.kiszka, mst, qemu-devel, blauwirbel
Cc: pbonzini, rusty, kvm, netdev
In-Reply-To: <20111022053540.21526.61249.stgit@dhcp-8-146.nay.redhat.com>
Export and move announce_self_create() to net.c in order to be used by model
specific announcing function.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
net.c | 31 +++++++++++++++++++++++++++++++
net.h | 1 +
savevm.c | 32 --------------------------------
3 files changed, 32 insertions(+), 32 deletions(-)
diff --git a/net.c b/net.c
index d05930c..516ff9e 100644
--- a/net.c
+++ b/net.c
@@ -42,6 +42,37 @@ static QTAILQ_HEAD(, VLANClientState) non_vlan_clients;
int default_net = 1;
+#ifndef ETH_P_RARP
+#define ETH_P_RARP 0x8035
+#endif
+#define ARP_HTYPE_ETH 0x0001
+#define ARP_PTYPE_IP 0x0800
+#define ARP_OP_REQUEST_REV 0x3
+
+int announce_self_create(uint8_t *buf, uint8_t *mac_addr)
+{
+ /* Ethernet header. */
+ memset(buf, 0xff, 6); /* destination MAC addr */
+ memcpy(buf + 6, mac_addr, 6); /* source MAC addr */
+ *(uint16_t *)(buf + 12) = htons(ETH_P_RARP); /* ethertype */
+
+ /* RARP header. */
+ *(uint16_t *)(buf + 14) = htons(ARP_HTYPE_ETH); /* hardware addr space */
+ *(uint16_t *)(buf + 16) = htons(ARP_PTYPE_IP); /* protocol addr space */
+ *(buf + 18) = 6; /* hardware addr length (ethernet) */
+ *(buf + 19) = 4; /* protocol addr length (IPv4) */
+ *(uint16_t *)(buf + 20) = htons(ARP_OP_REQUEST_REV); /* opcode */
+ memcpy(buf + 22, mac_addr, 6); /* source hw addr */
+ memset(buf + 28, 0x00, 4); /* source protocol addr */
+ memcpy(buf + 32, mac_addr, 6); /* target hw addr */
+ memset(buf + 38, 0x00, 4); /* target protocol addr */
+
+ /* Padding to get up to 60 bytes (ethernet min packet size, minus FCS). */
+ memset(buf + 42, 0x00, 18);
+
+ return 60; /* len (FCS will be added by hardware) */
+}
+
/***********************************************************/
/* network device redirectors */
diff --git a/net.h b/net.h
index 9f633f8..4943d4b 100644
--- a/net.h
+++ b/net.h
@@ -178,5 +178,6 @@ int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd);
int net_handle_fd_param(Monitor *mon, const char *param);
+int announce_self_create(uint8_t *buf, uint8_t *mac_addr);
#endif
diff --git a/savevm.c b/savevm.c
index bf4d0e7..8293ee6 100644
--- a/savevm.c
+++ b/savevm.c
@@ -85,38 +85,6 @@
#define SELF_ANNOUNCE_ROUNDS 5
-#ifndef ETH_P_RARP
-#define ETH_P_RARP 0x8035
-#endif
-#define ARP_HTYPE_ETH 0x0001
-#define ARP_PTYPE_IP 0x0800
-#define ARP_OP_REQUEST_REV 0x3
-
-static int announce_self_create(uint8_t *buf,
- uint8_t *mac_addr)
-{
- /* Ethernet header. */
- memset(buf, 0xff, 6); /* destination MAC addr */
- memcpy(buf + 6, mac_addr, 6); /* source MAC addr */
- *(uint16_t *)(buf + 12) = htons(ETH_P_RARP); /* ethertype */
-
- /* RARP header. */
- *(uint16_t *)(buf + 14) = htons(ARP_HTYPE_ETH); /* hardware addr space */
- *(uint16_t *)(buf + 16) = htons(ARP_PTYPE_IP); /* protocol addr space */
- *(buf + 18) = 6; /* hardware addr length (ethernet) */
- *(buf + 19) = 4; /* protocol addr length (IPv4) */
- *(uint16_t *)(buf + 20) = htons(ARP_OP_REQUEST_REV); /* opcode */
- memcpy(buf + 22, mac_addr, 6); /* source hw addr */
- memset(buf + 28, 0x00, 4); /* source protocol addr */
- memcpy(buf + 32, mac_addr, 6); /* target hw addr */
- memset(buf + 38, 0x00, 4); /* target protocol addr */
-
- /* Padding to get up to 60 bytes (ethernet min packet size, minus FCS). */
- memset(buf + 42, 0x00, 18);
-
- return 60; /* len (FCS will be added by hardware) */
-}
-
static void qemu_announce_self_iter(NICState *nic, void *opaque)
{
uint8_t buf[60];
^ permalink raw reply related
* [RFC v2 PATCH 3/4] net: model specific announcing support
From: Jason Wang @ 2011-10-22 5:38 UTC (permalink / raw)
To: aliguori, quintela, jan.kiszka, mst, qemu-devel, blauwirbel
Cc: pbonzini, rusty, kvm, netdev
In-Reply-To: <20111022053540.21526.61249.stgit@dhcp-8-146.nay.redhat.com>
This patch introduce a function pointer in NetClientInfo which is
called during self announcement to do the model specific announcement
such as sending gratuitous packet. Previous method is kept when model
specific announcing fails or without it.
The first user would be virtio-net.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
net.h | 2 ++
savevm.c | 8 +++++---
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/net.h b/net.h
index 4943d4b..1845f01 100644
--- a/net.h
+++ b/net.h
@@ -46,6 +46,7 @@ typedef ssize_t (NetReceive)(VLANClientState *, const uint8_t *, size_t);
typedef ssize_t (NetReceiveIOV)(VLANClientState *, const struct iovec *, int);
typedef void (NetCleanup) (VLANClientState *);
typedef void (LinkStatusChanged)(VLANClientState *);
+typedef int (NetAnnounce)(VLANClientState *);
typedef struct NetClientInfo {
net_client_type type;
@@ -57,6 +58,7 @@ typedef struct NetClientInfo {
NetCleanup *cleanup;
LinkStatusChanged *link_status_changed;
NetPoll *poll;
+ NetAnnounce *announce;
} NetClientInfo;
struct VLANClientState {
diff --git a/savevm.c b/savevm.c
index 8293ee6..de6a01a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -89,10 +89,12 @@ static void qemu_announce_self_iter(NICState *nic, void *opaque)
{
uint8_t buf[60];
int len;
+ NetAnnounce *func = nic->nc.info->announce;
- len = announce_self_create(buf, nic->conf->macaddr.a);
-
- qemu_send_packet_raw(&nic->nc, buf, len);
+ if (func == NULL || func(&nic->nc) != 0) {
+ len = announce_self_create(buf, nic->conf->macaddr.a);
+ qemu_send_packet_raw(&nic->nc, buf, len);
+ }
}
^ permalink raw reply related
* [RFC v2 PATCH 4/4] virtio-net: notify guest to annouce itself
From: Jason Wang @ 2011-10-22 5:39 UTC (permalink / raw)
To: aliguori, quintela, jan.kiszka, mst, qemu-devel, blauwirbel
Cc: pbonzini, rusty, kvm, netdev
In-Reply-To: <20111022053540.21526.61249.stgit@dhcp-8-146.nay.redhat.com>
It's hard to track all mac address and its usage (vlan, bondings,
ipv6) in qemu to send gratituous packet in qemu side, so the better
choice is let guest do it.
The patch introduces a new rw config status bit of virtio-net,
VIRTIO_NET_S_ANNOUNCE which is used to notify guest to announce itself
( such as sending gratituous packets ) through config update
interrupt. When gust have done the annoucement, it should clear that
bit.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
hw/virtio-net.c | 20 +++++++++++++++++++-
hw/virtio-net.h | 2 ++
2 files changed, 21 insertions(+), 1 deletions(-)
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 8c2f460..7f844e7 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -95,6 +95,10 @@ static void virtio_net_set_config(VirtIODevice *vdev, const uint8_t *config)
memcpy(n->mac, netcfg.mac, ETH_ALEN);
qemu_format_nic_info_str(&n->nic->nc, n->mac);
}
+
+ if (memcmp(&netcfg.status, &n->status, sizeof(n->status))) {
+ memcpy(&n->status, &netcfg.status, sizeof(n->status));
+ }
}
static bool virtio_net_started(VirtIONet *n, uint8_t status)
@@ -227,7 +231,7 @@ static uint32_t virtio_net_get_features(VirtIODevice *vdev, uint32_t features)
{
VirtIONet *n = to_virtio_net(vdev);
- features |= (1 << VIRTIO_NET_F_MAC);
+ features |= (1 << VIRTIO_NET_F_MAC | 1 << VIRTIO_NET_F_GUEST_ANNOUNCE);
if (peer_has_vnet_hdr(n)) {
tap_using_vnet_hdr(n->nic->nc.peer, 1);
@@ -983,6 +987,19 @@ static void virtio_net_cleanup(VLANClientState *nc)
n->nic = NULL;
}
+static int virtio_net_announce(VLANClientState *nc)
+{
+ VirtIONet *n = DO_UPCAST(NICState, nc, nc)->opaque;
+
+ if (n->vdev.guest_features & (0x1 << VIRTIO_NET_F_GUEST_ANNOUNCE)) {
+ n->status |= VIRITO_NET_S_ANNOUNCE;
+ virtio_notify_config(&n->vdev);
+ return 0;
+ }
+
+ return 1;
+}
+
static NetClientInfo net_virtio_info = {
.type = NET_CLIENT_TYPE_NIC,
.size = sizeof(NICState),
@@ -990,6 +1007,7 @@ static NetClientInfo net_virtio_info = {
.receive = virtio_net_receive,
.cleanup = virtio_net_cleanup,
.link_status_changed = virtio_net_set_link_status,
+ .announce = virtio_net_announce,
};
VirtIODevice *virtio_net_init(DeviceState *dev, NICConf *conf,
diff --git a/hw/virtio-net.h b/hw/virtio-net.h
index 4468741..c47bd52 100644
--- a/hw/virtio-net.h
+++ b/hw/virtio-net.h
@@ -44,8 +44,10 @@
#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce itself */
#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
+#define VIRITO_NET_S_ANNOUNCE 2 /* Announcement is needed */
#define TX_TIMER_INTERVAL 150000 /* 150 us */
^ permalink raw reply related
* [RFC v2 PATCH 5/4 PATCH] virtio-net: send gratuitous packet when needed
From: Jason Wang @ 2011-10-22 5:43 UTC (permalink / raw)
To: aliguori, quintela, jan.kiszka, mst, qemu-devel, blauwirbel
Cc: pbonzini, rusty, kvm, netdev
This make let virtio-net driver can send gratituous packet by a new
config bit - VIRTIO_NET_S_ANNOUNCE in each config update
interrupt. When this bit is set by backend, the driver would schedule
a workqueue to send gratituous packet through NETDEV_NOTIFY_PEERS.
This feature is negotiated through bit VIRTIO_NET_F_GUEST_ANNOUNCE.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 31 ++++++++++++++++++++++++++++++-
include/linux/virtio_net.h | 2 ++
2 files changed, 32 insertions(+), 1 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b8225f3..1cdecf7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -71,6 +71,9 @@ struct virtnet_info {
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
+ /* Work struct for send gratituous packet. */
+ struct work_struct announce;
+
/* Chain pages by the private ptr. */
struct page *pages;
@@ -507,6 +510,13 @@ static void refill_work(struct work_struct *work)
schedule_delayed_work(&vi->refill, HZ/2);
}
+static void announce_work(struct work_struct *work)
+{
+ struct virtnet_info *vi = container_of(work, struct virtnet_info,
+ announce);
+ netif_notify_peers(vi->dev);
+}
+
static int virtnet_poll(struct napi_struct *napi, int budget)
{
struct virtnet_info *vi = container_of(napi, struct virtnet_info, napi);
@@ -923,11 +933,22 @@ static void virtnet_update_status(struct virtnet_info *vi)
&v, sizeof(v));
/* Ignore unknown (future) status bits */
- v &= VIRTIO_NET_S_LINK_UP;
+ v &= VIRTIO_NET_S_LINK_UP | VIRTIO_NET_S_ANNOUNCE;
if (vi->status == v)
return;
+ if (v & VIRTIO_NET_S_ANNOUNCE) {
+ if ((v & VIRTIO_NET_S_LINK_UP) &&
+ virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ schedule_work(&vi->announce);
+ v &= ~VIRTIO_NET_S_ANNOUNCE;
+ vi->vdev->config->set(vi->vdev,
+ offsetof(struct virtio_net_config,
+ status),
+ &v, sizeof(v));
+ }
+
vi->status = v;
if (vi->status & VIRTIO_NET_S_LINK_UP) {
@@ -937,6 +958,7 @@ static void virtnet_update_status(struct virtnet_info *vi)
netif_carrier_off(vi->dev);
netif_stop_queue(vi->dev);
}
+
}
static void virtnet_config_changed(struct virtio_device *vdev)
@@ -1016,6 +1038,8 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free;
INIT_DELAYED_WORK(&vi->refill, refill_work);
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ INIT_WORK(&vi->announce, announce_work);
sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
@@ -1077,6 +1101,8 @@ static int virtnet_probe(struct virtio_device *vdev)
unregister:
unregister_netdev(dev);
cancel_delayed_work_sync(&vi->refill);
+ if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ cancel_work_sync(&vi->announce);
free_vqs:
vdev->config->del_vqs(vdev);
free_stats:
@@ -1118,6 +1144,8 @@ static void __devexit virtnet_remove(struct virtio_device *vdev)
unregister_netdev(vi->dev);
cancel_delayed_work_sync(&vi->refill);
+ if(virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ANNOUNCE))
+ cancel_work_sync(&vi->announce);
/* Free unused buffers in both send and recv, if any. */
free_unused_bufs(vi);
@@ -1144,6 +1172,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
+ VIRTIO_NET_F_GUEST_ANNOUNCE,
};
static struct virtio_driver virtio_net_driver = {
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 970d5a2..44a38d6 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -49,8 +49,10 @@
#define VIRTIO_NET_F_CTRL_RX 18 /* Control channel RX mode support */
#define VIRTIO_NET_F_CTRL_VLAN 19 /* Control channel VLAN filtering */
#define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */
+#define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can send gratituous packet */
#define VIRTIO_NET_S_LINK_UP 1 /* Link is up */
+#define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */
struct virtio_net_config {
/* The config defining mac address (if VIRTIO_NET_F_MAC) */
^ permalink raw reply related
* Re: Kernel panic from tg3 net driver
From: Ari Savolainen @ 2011-10-22 6:12 UTC (permalink / raw)
To: RongQing Li
Cc: Eric Dumazet, David Miller, richardcochran, netdev, linux-kernel
In-Reply-To: <CAJFZqHzWUsP7szrjVwmW5+HWRND3R4qxWcnK=bqbsgL6F7XGeA@mail.gmail.com>
2011/10/21 RongQing Li <roy.qing.li@gmail.com>:
> Hi Ari:
>
> Are you sure the patch is applied correctly and the log is same?
> If the log is not same, could you paste it again.
>
> Thanks
Yes, I'm sure. The panic and the rcu splat are unrelated. The panic
occurs when skb_tx_timestamp is being called after skb having been
freed by tigon3_dma_hwbug_workaround.
Ari
^ permalink raw reply
* Re: Kernel panic from tg3 net driver
From: Eric Dumazet @ 2011-10-22 6:37 UTC (permalink / raw)
To: Ari Savolainen
Cc: RongQing Li, David Miller, richardcochran, netdev, linux-kernel
In-Reply-To: <CAEbykaVJneQ0ozUvYh2PRn-p_BowL4z4_4Y8EFCdO9mOsNX7OQ@mail.gmail.com>
Le samedi 22 octobre 2011 à 09:12 +0300, Ari Savolainen a écrit :
> 2011/10/21 RongQing Li <roy.qing.li@gmail.com>:
> > Hi Ari:
> >
> > Are you sure the patch is applied correctly and the log is same?
> > If the log is not same, could you paste it again.
> >
> > Thanks
>
> Yes, I'm sure. The panic and the rcu splat are unrelated. The panic
> occurs when skb_tx_timestamp is being called after skb having been
> freed by tigon3_dma_hwbug_workaround.
OK that makes sense, thanks !
Do you plan to submit a patch, now you found the bug ?
^ permalink raw reply
* Re: [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt
From: Maciej Żenczykowski @ 2011-10-22 6:49 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20111022.000406.350185785547409199.davem@davemloft.net>
2011/10/21 David Miller <davem@davemloft.net>:
> From: Maciej Żenczykowski <zenczykowski@gmail.com>
> Date: Fri, 21 Oct 2011 15:22:05 -0700
>
>> From: Maciej Żenczykowski <maze@google.com>
>>
>> This change adds a sysctl (/proc/sys/net/core/allow_so_priority)
>> with a default of true (1), as such it does not change the default
>> behaviour of the Linux kernel.
>>
>> This sysctl can be set to false (0), this will result in non
>> CAP_NET_ADMIN processes being unable to set SO_PRIORITY socket
>> option.
>>
>> This is desireable if we want to rely on socket/skb priorities
>> being inferred from TOS/TCLASS bits.
>>
>> Signed-off-by: Maciej Żenczykowski <maze@google.com>
>
> The socket layer is not the place to enforce this.
>
> The ingress into your MPLS/RSVP cloud that actually provides the
> quality of service is where you control and mangle the TOS as needed.
>
> Sorry, I'm not applying anything like this. Any machine on your
> network can spit out any TOS it wants, and if you have control over
> the apps change it's behavior there. If you don't have control over
> the apps then filter and mangle.
Hmm, so I already have container (cgroup) limits on what TOS settings,
a process is allowed to set (query: would you be interested in accepting a patch
for that at some point in the future?).
Normally setting IP_TOS also automatically sets sock->sk_priority
(based on a mapping), which
gets inherited into skb->priority, which can then be used for stuff
like hardware
priority queue dispatch (basically xps + skb->priority queue selection).
Either via an XPS like mechanism, or a QDISC like mechanism (preferred), or
an in-driver mechanism (currently have a hack which does this).
However, processes can also manually override the sk_priority by calling
SO_PRIORITY directly, at which point their IP_TOS and SO_PRIORITY no
longer match.
This patch allows you to disable this ability. It's not affecting the
on-the-wire bits
in any way, it's really only affecting packet classification at the
qdisc and in the driver.
As you can see this patch isn't about TOS, it's about the kernel
internal skb priority setting.
On a related note, while setting IP_TOS sets sk_priority, setting
IPV6_TCLASS does not
set sk_priority. I'd like to see this behaviour be consistent. As
such was planning on
sending you a patch to add sk_priority = rt_tos2priority(val) to the
IPV6_TCLASS setsockopt
code path. Is that ok?
- Maciej
^ permalink raw reply
* Re: [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt
From: David Miller @ 2011-10-22 6:58 UTC (permalink / raw)
To: zenczykowski; +Cc: netdev
In-Reply-To: <CAHo-OoxtAmzPjr4R7jQOk9GGQo7i-qVmoaY8vTYDQDj1HXsPNA@mail.gmail.com>
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Fri, 21 Oct 2011 23:49:00 -0700
> However, processes can also manually override the sk_priority by calling
> SO_PRIORITY directly, at which point their IP_TOS and SO_PRIORITY no
> longer match.
>
> This patch allows you to disable this ability.
I also don't see why we'd want to allow disabling this either.
I really hate these patches that offer ways to disable things
that normally work, and thus break apps when the non-default
is selected.
I kind of have a feeling the kind of situation you're trying to
account for, you have some cloud where people run random stuff
that you don't control.
But you didn't specify this, and we just have to guess. Why don't you
describe the specific situation where you want to modify this setting?
Please do this instead of just talking about what the side effects are
inside of the kernel. That's much less interesting when it comes to
patches like this.
^ permalink raw reply
* [PATCH] tg3: fix tigon3_dma_hwbug_workaround()
From: Eric Dumazet @ 2011-10-22 7:25 UTC (permalink / raw)
To: Ari Savolainen
Cc: RongQing Li, David Miller, richardcochran, netdev, linux-kernel
In-Reply-To: <1319265470.6180.13.camel@edumazet-laptop>
Ari got kernel panics using tg3 NIC, and bisected to 2669069aacc9 "tg3:
enable transmit time stamping."
This is because tigon3_dma_hwbug_workaround() might alloc a new skb and
free the original. We panic when skb_tx_timestamp() is called on freed
skb.
Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
drivers/net/tg3.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 4a1374d..6149dc5 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -6029,12 +6029,12 @@ static void tg3_tx_skb_unmap(struct tg3_napi *tnapi, u32 entry, int last)
/* Workaround 4GB and 40-bit hardware DMA bugs. */
static int tigon3_dma_hwbug_workaround(struct tg3_napi *tnapi,
- struct sk_buff *skb,
+ struct sk_buff **pskb,
u32 *entry, u32 *budget,
u32 base_flags, u32 mss, u32 vlan)
{
struct tg3 *tp = tnapi->tp;
- struct sk_buff *new_skb;
+ struct sk_buff *new_skb, *skb = *pskb;
dma_addr_t new_addr = 0;
int ret = 0;
@@ -6076,7 +6076,7 @@ static int tigon3_dma_hwbug_workaround(struct tg3_napi *tnapi,
}
dev_kfree_skb(skb);
-
+ *pskb = new_skb;
return ret;
}
@@ -6305,7 +6305,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
*/
entry = tnapi->tx_prod;
budget = tg3_tx_avail(tnapi);
- if (tigon3_dma_hwbug_workaround(tnapi, skb, &entry, &budget,
+ if (tigon3_dma_hwbug_workaround(tnapi, &skb, &entry, &budget,
base_flags, mss, vlan))
goto out_unlock;
}
^ permalink raw reply related
* Re: [PATCH] tg3: fix tigon3_dma_hwbug_workaround()
From: David Miller @ 2011-10-22 7:30 UTC (permalink / raw)
To: eric.dumazet
Cc: ari.m.savolainen, roy.qing.li, richardcochran, netdev,
linux-kernel
In-Reply-To: <1319268338.6180.20.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 22 Oct 2011 09:25:38 +0200
> Ari got kernel panics using tg3 NIC, and bisected to 2669069aacc9 "tg3:
> enable transmit time stamping."
>
> This is because tigon3_dma_hwbug_workaround() might alloc a new skb and
> free the original. We panic when skb_tx_timestamp() is called on freed
> skb.
>
> Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
^ permalink raw reply
* [BUG] bonding : LOCKDEP warning
From: Eric Dumazet @ 2011-10-22 7:36 UTC (permalink / raw)
To: netdev
On latest net-next I got following splat
[ 5.749651] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[ 5.749655] bonding: MII link monitoring set to 100 ms
[ 5.749676] BUG: key f49a831c not in .data!
[ 5.749677] ------------[ cut here ]------------
[ 5.749752] WARNING: at kernel/lockdep.c:2897 lockdep_init_map+0x1c3/0x460()
[ 5.749809] Hardware name: ProLiant BL460c G1
[ 5.749862] Modules linked in: bonding(+)
[ 5.749978] Pid: 3177, comm: modprobe Not tainted 3.1.0-rc9-02177-gf2d1a4e-dirty #1157
[ 5.750066] Call Trace:
[ 5.750120] [<c1352c2f>] ? printk+0x18/0x21
[ 5.750176] [<c103112d>] warn_slowpath_common+0x6d/0xa0
[ 5.750231] [<c1060133>] ? lockdep_init_map+0x1c3/0x460
[ 5.750287] [<c1060133>] ? lockdep_init_map+0x1c3/0x460
[ 5.750342] [<c103117d>] warn_slowpath_null+0x1d/0x20
[ 5.750398] [<c1060133>] lockdep_init_map+0x1c3/0x460
[ 5.750453] [<c1355ddd>] ? _raw_spin_unlock+0x1d/0x20
[ 5.750510] [<c11255c8>] ? sysfs_new_dirent+0x68/0x110
[ 5.750565] [<c1124d4b>] sysfs_add_file_mode+0x8b/0xe0
[ 5.750621] [<c1124db3>] sysfs_add_file+0x13/0x20
[ 5.750675] [<c1124e7c>] sysfs_create_file+0x1c/0x20
[ 5.750737] [<c1208f09>] class_create_file+0x19/0x20
[ 5.750794] [<c12c186f>] netdev_class_create_file+0xf/0x20
[ 5.750853] [<f85deaf4>] bond_create_sysfs+0x44/0x90 [bonding]
[ 5.750911] [<f8410947>] ? bond_create_proc_dir+0x1e/0x3e [bonding]
[ 5.750970] [<f841007e>] bond_net_init+0x7e/0x87 [bonding]
[ 5.751026] [<f8410000>] ? 0xf840ffff
[ 5.751080] [<c12abc7a>] ops_init.clone.4+0xba/0x100
[ 5.751135] [<c12abdb2>] ? register_pernet_subsys+0x12/0x30
[ 5.751191] [<c12abd03>] register_pernet_operations.clone.3+0x43/0x80
[ 5.751249] [<c12abdb9>] register_pernet_subsys+0x19/0x30
[ 5.751306] [<f84108b9>] bonding_init+0x832/0x8a2 [bonding]
[ 5.751363] [<c10011f0>] do_one_initcall+0x30/0x160
[ 5.751420] [<f8410087>] ? bond_net_init+0x87/0x87 [bonding]
[ 5.751477] [<c106d5cf>] sys_init_module+0xef/0x1890
[ 5.751533] [<c1356490>] sysenter_do_call+0x12/0x36
[ 5.751588] ---[ end trace 89f492d83a7f5006 ]---
^ permalink raw reply
* Re: [PATCH] tg3: fix tigon3_dma_hwbug_workaround()
From: Ari Savolainen @ 2011-10-22 7:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: RongQing Li, David Miller, richardcochran, netdev, linux-kernel
In-Reply-To: <1319268338.6180.20.camel@edumazet-laptop>
I tried a similar patch earlier and got another panic with that. I was
quite tired at that time and may have made a mistake. I'll test Eric's
patch either later today or tomorrow.
Ari
2011/10/22 Eric Dumazet <eric.dumazet@gmail.com>:
> Ari got kernel panics using tg3 NIC, and bisected to 2669069aacc9 "tg3:
> enable transmit time stamping."
>
> This is because tigon3_dma_hwbug_workaround() might alloc a new skb and
> free the original. We panic when skb_tx_timestamp() is called on freed
> skb.
>
> Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> drivers/net/tg3.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 4a1374d..6149dc5 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -6029,12 +6029,12 @@ static void tg3_tx_skb_unmap(struct tg3_napi *tnapi, u32 entry, int last)
>
> /* Workaround 4GB and 40-bit hardware DMA bugs. */
> static int tigon3_dma_hwbug_workaround(struct tg3_napi *tnapi,
> - struct sk_buff *skb,
> + struct sk_buff **pskb,
> u32 *entry, u32 *budget,
> u32 base_flags, u32 mss, u32 vlan)
> {
> struct tg3 *tp = tnapi->tp;
> - struct sk_buff *new_skb;
> + struct sk_buff *new_skb, *skb = *pskb;
> dma_addr_t new_addr = 0;
> int ret = 0;
>
> @@ -6076,7 +6076,7 @@ static int tigon3_dma_hwbug_workaround(struct tg3_napi *tnapi,
> }
>
> dev_kfree_skb(skb);
> -
> + *pskb = new_skb;
> return ret;
> }
>
> @@ -6305,7 +6305,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
> */
> entry = tnapi->tx_prod;
> budget = tg3_tx_avail(tnapi);
> - if (tigon3_dma_hwbug_workaround(tnapi, skb, &entry, &budget,
> + if (tigon3_dma_hwbug_workaround(tnapi, &skb, &entry, &budget,
> base_flags, mss, vlan))
> goto out_unlock;
> }
>
>
>
^ permalink raw reply
* Re: [BUG] bonding : LOCKDEP warning
From: David Miller @ 2011-10-22 7:58 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, ebiederm
In-Reply-To: <1319268987.6180.21.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 22 Oct 2011 09:36:27 +0200
> On latest net-next I got following splat
I suspect Biederman's namespace patch set.
Eric B. please take a look.
^ permalink raw reply
* Re: [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt
From: Maciej Żenczykowski @ 2011-10-22 8:27 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20111022.025836.1306779710775525629.davem@davemloft.net>
> I also don't see why we'd want to allow disabling this either.
> I really hate these patches that offer ways to disable things
> that normally work, and thus break apps when the non-default
> is selected.
Well... the purpose of settings like this is precisely to break functionality
when the default is not set ;-)
> I kind of have a feeling the kind of situation you're trying to
> account for, you have some cloud where people run random stuff
> that you don't control.
Yes, I have control of the kernel, I have control of root, I have control of
some daemons that are running on the machine, but I don't really have
control of the entirety of userspace, some of it I have source code for
and could audit to guarantee correctness (but I can't really enforce
that on the users, ultimately they can run any binary),
and for some of it I don't even have that. Either way, it's much
easier to delegate setting policy to
userspace management daemon(s), and leave enforcing it to the kernel.
This is just one more such knob.
> But you didn't specify this, and we just have to guess. Why don't you
> describe the specific situation where you want to modify this setting?
> Please do this instead of just talking about what the side effects are
> inside of the kernel. That's much less interesting when it comes to
> patches like this.
Very well, that's a good point.
Here's an attempt to provide some insight.
I am attempting to allow not-fully-code-audited nor fully trusted apps to run
in a cgroup containerized environment, with many apps in many
containers (not 1:1, has hierarchies) on a single kernel.
The apps are in the believed to not be actively malicious class, but
very likely to be buggy, or written by ill-advised programmers based
on wrong/outdated or otherwise incorrect documentation. I cannot rely
on unprivileged userspace getting things right.
I have to have some mechanism to grant these apps permissions to
utilize specific levels of network fabric priority. For this I have
the aforementioned per-cgroup allowed TOS settings. VLANs are not appropriate
because a client with high priority net privs is allowed to send a
request to a server with no special priority permissions.
(there are further patches to support tcp tos reflection so the server
can automatically respond with the client's priority)
Multiqueue networking combined with hardware priority queues and xps
desires to use skb->priority + active cpu for tx queue selection.
In this particular case TX queue selection should happen based on the
TOS priority.
Setting TOS automatically sets sk_priority (and hence skb->priority).
So all's good, so long as userspace doesn't go and change the
sk_priority field via SO_PRIORITY and break the mapping.
As a further note:
Some of these apps may be a little more special, a little more
audited, and a little more trusted.
Enough so that they might be granted CAP_NET_RAW, but not enough so
that they can get CAP_NET_ADMIN.
Hence the general desire for CAP_NET_ADMIN to control general
machine-global networking state, but not have it control
per-socket or per-packet settings. ie. bringing up or down an
interface affects everyone (hence must be CAP_NET_ADMIN, and much more
tightly controlled), while spoofing a packet doesn't really negatively
affect anyone (you can't assume the network is trusted, so there can
be
external sources of spoofing or eavesdropping anyway).
---
I could attempt to publish the vast majority of our internal
networking code base (there isn't really anything secret in there),
but it's based on 2.6.34 and even after two years of attempting to
clean it up and refactor it (along with a rebase from 2.6.26, and all
while actively continuing development) I'm still not at the point were
I would consider this to be a particular useful course of action
(there's a lot of bugfixes of bugfixes of crappy patches in there,
plus hacks, plus tons of backports from upstream, and tons of code
which is upstream but slightly differently then we have it internally,
because we had it first, and pushed v2 upstream, etc...). Instead I'm
trying to get the easy hanging fruit out of the way, rebase our
patches onto probably 3.2 or 3.3, likely sending some more your way
during the process, and see where that leaves us. Basically trying to
reduce the delta. We will always have internal only patches, but the
fewer, the less burden for us, hence I'm trying to get the ones I
believe to be potentially useful externally upstreamed. Obviously
whatever patches you don't accept, we'll still keep around locally.
Maciej
^ permalink raw reply
* Re: [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt
From: David Miller @ 2011-10-22 8:40 UTC (permalink / raw)
To: zenczykowski; +Cc: netdev
In-Reply-To: <CAHo-OoyVSbsxb8U3Y5WCNRsxjr00g1O3HJcT1fmu5cmP5i-JsA@mail.gmail.com>
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Sat, 22 Oct 2011 01:27:03 -0700
> I am attempting to allow not-fully-code-audited nor fully trusted apps to run
> in a cgroup containerized environment, with many apps in many
> containers (not 1:1, has hierarchies) on a single kernel.
Extend, if necessary, the cgroup classifier so you can use it to clip
off the socket inherited priority in the SKB for this cgroup.
Really, this control has no business in the socket API layer.
^ permalink raw reply
* [PATCH net-next] w5300: add WIZnet W5300 Ethernet driver
From: Taehun Kim @ 2011-10-22 8:41 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-kernel, netdev, romieu, suhwan, bongbong
hello, guys.
I have rewritten W5300 driver by applying the Francois Romieu's feedback
(http://marc.info/?l=linux-netdev&m=131714561419786&w=2).
This driver has been tested in the ARM board.
Please review this driver and apply it if do not have any problems.
Thank you.
T.K.
Signed-off-by: Taehun Kim <kth3321@gmail.com>
---
drivers/net/ethernet/Kconfig | 1 +
drivers/net/ethernet/Makefile | 1 +
drivers/net/ethernet/wiznet/Kconfig | 32 ++
drivers/net/ethernet/wiznet/Makefile | 5 +
drivers/net/ethernet/wiznet/w5300.c | 706 ++++++++++++++++++++++++++++++++++
drivers/net/ethernet/wiznet/w5300.h | 121 ++++++
6 files changed, 866 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/ethernet/wiznet/Kconfig
create mode 100644 drivers/net/ethernet/wiznet/Makefile
create mode 100644 drivers/net/ethernet/wiznet/w5300.c
create mode 100644 drivers/net/ethernet/wiznet/w5300.h
diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 6dff5a0..6325d85 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -173,5 +173,6 @@ source "drivers/net/ethernet/tundra/Kconfig"
source "drivers/net/ethernet/via/Kconfig"
source "drivers/net/ethernet/xilinx/Kconfig"
source "drivers/net/ethernet/xircom/Kconfig"
+source "drivers/net/ethernet/wiznet/Kconfig"
endif # ETHERNET
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index c53ad3a..7bd5211 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -72,3 +72,4 @@ obj-$(CONFIG_NET_VENDOR_TUNDRA) += tundra/
obj-$(CONFIG_NET_VENDOR_VIA) += via/
obj-$(CONFIG_NET_VENDOR_XILINX) += xilinx/
obj-$(CONFIG_NET_VENDOR_XIRCOM) += xircom/
+obj-$(CONFIG_NET_VENDOR_WIZNET) += wiznet/
diff --git a/drivers/net/ethernet/wiznet/Kconfig b/drivers/net/ethernet/wiznet/Kconfig
new file mode 100644
index 0000000..b5925bd
--- /dev/null
+++ b/drivers/net/ethernet/wiznet/Kconfig
@@ -0,0 +1,32 @@
+#
+# WIZnet device configuration
+#
+
+config NET_VENDOR_WIZNET
+ bool "WIZnet devices"
+ default y
+ ---help---
+ If you have a network (Ethernet) card belonging to this class, say Y
+ and read the Ethernet-HOWTO, available from
+ <http://www.tldp.org/docs.html#howto>.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about WIZnet devices. If you say Y, you will be asked for
+ your specific card in the following questions.
+
+if NET_VENDOR_WIZNET
+
+config W5300
+ tristate "WIZnet W5300 Ethernet support"
+ depends on ARM
+ ---help---
+ This driver supports the Ethernet in the WIZnet W5300 chips.
+ W5300 supports hardwired TCP/IP stack. But this driver is limited to
+ the Ethernet function. To use hardwired TCP/IP stack, need to modify
+ the TCP/IP stack in linux kerenl.
+
+ To compile this driver as a module, choose M here: the module
+ will be called w5300.
+
+endif # NET_VENDOR_WIZNET
diff --git a/drivers/net/ethernet/wiznet/Makefile b/drivers/net/ethernet/wiznet/Makefile
new file mode 100644
index 0000000..53120bc
--- /dev/null
+++ b/drivers/net/ethernet/wiznet/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the WIZnet device drivers.
+#
+
+obj-$(CONFIG_W5300) += w5300.o
diff --git a/drivers/net/ethernet/wiznet/w5300.c b/drivers/net/ethernet/wiznet/w5300.c
new file mode 100644
index 0000000..14bbfee
--- /dev/null
+++ b/drivers/net/ethernet/wiznet/w5300.c
@@ -0,0 +1,706 @@
+/* w5300.c: A Linux Ethernet driver for the WIZnet W5300 chip. */
+/*
+ Copyright (C) 2011 Taehun Kim <kth3321@gmail.com>
+
+ This software may be used and distributed according to the terms of
+ the GNU General Public License (GPL), incorporated herein by reference.
+ Drivers based on or derived from this code fall under the GPL and must
+ retain the authorship, copyright and license notice. This file is not
+ a complete program and may only be used when the entire operating
+ system is licensed under the GPL.
+*/
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/errno.h>
+
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/skbuff.h>
+
+#include <linux/device.h>
+#include <linux/platform_device.h>
+#include <linux/ioport.h>
+#include <linux/io.h>
+
+#include "w5300.h"
+
+#define DEV_NAME "w5300"
+#define DRV_VERSION "1.0"
+#define DRV_RELDATE "Oct 22, 2011"
+
+#define W5300_DEF_MSG_ENABLE \
+ (NETIF_MSG_DRV | \
+ NETIF_MSG_TIMER | \
+ NETIF_MSG_IFUP | \
+ NETIF_MSG_RX_ERR | \
+ NETIF_MSG_INTR | \
+ NETIF_MSG_TX_DONE)
+
+static const char driver_info[] =
+ KERN_INFO DEV_NAME ": Ethernet driver v" DRV_VERSION "("
+ DRV_RELDATE ")\n";
+
+MODULE_AUTHOR("Taehun Kim <kth3321@gmail.com>");
+MODULE_DESCRIPTION("WIZnet W5300 Ethernet driver");
+MODULE_VERSION(DRV_VERSION);
+MODULE_LICENSE("GPL");
+
+/* Transmit timeout, default 5 seconds. */
+static int watchdog = 5000;
+module_param(watchdog, int, 0400);
+MODULE_PARM_DESC(watchdog, "transmit timeout in milliseconds");
+
+static int debug = -1;
+module_param(debug, int, 0);
+MODULE_PARM_DESC(debug, "W5300: bitmapped message enable number");
+
+/*
+ * This is W5300 information structure.
+ * Additional information is included in struct net_device.
+ */
+struct wiz_private {
+ void __iomem *base;
+ struct net_device *dev;
+ u8 rxbuf_conf[MAX_SOCK_NUM];
+ u8 txbuf_conf[MAX_SOCK_NUM];
+ struct net_device_stats stats;
+ struct napi_struct napi;
+ spinlock_t lock;
+ u32 msg_enable;
+};
+
+/* Default MAC address. */
+static __initdata u8 w5300_defmac[6] = {0x00, 0x08, 0xDC, 0xA0, 0x00, 0x01};
+
+/* Default RX/TX buffer size(KByte). */
+static u8 w5300_rxbuf_conf[MAX_SOCK_NUM] __initdata = {
+ 64, 0, 0, 0, 0, 0, 0, 0
+};
+
+static u8 w5300_txbuf_conf[MAX_SOCK_NUM] __initdata = {
+ 64, 0, 0, 0, 0, 0, 0, 0
+};
+
+/* Notifying packet size in the RX FIFO */
+static int w5300_get_rxsize(struct wiz_private *wp, int s)
+{
+ u32 val;
+
+ val = w5300_read(wp, Sn_RX_RSR(s));
+ val = (val << 16) + w5300_read(wp, Sn_RX_RSR(s) + 2);
+ return val;
+}
+
+/* Packet Receive Function. It reads received packet from the Rx FIFO. */
+static void w5300_recv_data(struct wiz_private *wp, int s, u8 *buf,
+ ssize_t len)
+{
+ int i;
+ u16 recv_data;
+
+ /* read from RX FIFO */
+ for (i = 0; i < len; i += 2) {
+ recv_data = w5300_read(wp, Sn_RX_FIFO(s));
+ buf[i] = (u8) ((recv_data & 0xFF00) >> 8);
+ buf[i + 1] = (u8) (recv_data & 0x00FF);
+ }
+}
+
+/* Setting MAC address of W5300 */
+static void w5300_set_macaddr(struct wiz_private *wp, u8 * addr)
+{
+ int i;
+
+ for (i = 0; i < 3; ++i) {
+ u16 mac_addr = (addr[2*i] << 8) | addr[2*i+1];
+
+ w5300_write(wp, SHAR + 2*i, mac_addr);
+ }
+}
+
+/* Opening channels of W5300 */
+static int w5300_channel_open(struct wiz_private *wp, u32 type)
+{
+ int timeout = 1000;
+
+ /* Which type will be used for open? */
+ switch (type) {
+ case Sn_MR_MACRAW:
+ case Sn_MR_MACRAW_MF:
+ w5300_write(wp, Sn_MR(0), type);
+ break;
+ default:
+ netif_err(wp, ifup, wp->dev,
+ "Unknown socket type (%d)\n", type);
+
+ return -EFAULT;
+ }
+
+ w5300_write(wp, Sn_CR(0), Sn_CR_OPEN);
+
+ while (timeout--) {
+ if (!w5300_read(wp, Sn_CR(0)))
+ return 0;
+ udelay(1);
+ }
+
+ return -EBUSY;
+}
+
+/* Activating the interrupt of related channel */
+static void w5300_interrupt_enable(struct wiz_private *wp)
+{
+ u16 mask;
+
+ mask = w5300_read(wp, IMR) | 0x1;
+ w5300_write(wp, IMR, mask);
+}
+
+/* De-activating the interrupt of related channel */
+static void w5300_interrupt_disable(struct wiz_private *wp)
+{
+ u16 mask;
+
+ mask = w5300_read(wp, IMR) & ~0x1;
+ w5300_write(wp, IMR, mask);
+}
+
+/* W5300 initialization function */
+static int w5300_reset(struct net_device *dev)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+ u32 txbuf_total = 0, i;
+ u16 mem_cfg = 0;
+ u16 rx_size, tx_size;
+
+ netif_dbg(wp, drv, wp->dev, "w5300 chip reset\n");
+
+ /* W5300 is initialized by sending RESET command. */
+ w5300_write(wp, MR, MR_RST);
+ mdelay(5);
+
+ /* Mode Register Setting
+ * Ping uses S/W stack of the Linux kernel. Set the Ping Block.*/
+ w5300_write(wp, MR, MR_WDF(1) | MR_PB);
+
+ /* Setting MAC address */
+ w5300_set_macaddr(wp, dev->dev_addr);
+
+ /* Setting the size of Rx/Tx FIFO */
+ for (i = 0; i < MAX_SOCK_NUM; ++i) {
+ if (wp->rxbuf_conf[i] > 64) {
+ netif_err(wp, drv, wp->dev,
+ "Illegal Channel(%d) RX memory size.\n", i);
+
+ return -EINVAL;
+ }
+ if (wp->txbuf_conf[i] > 64) {
+ netif_err(wp, drv, wp->dev,
+ "Illegal Channel(%d) TX memory size.\n", i);
+
+ return -EINVAL;
+ }
+ txbuf_total += wp->txbuf_conf[i];
+ }
+
+ if (txbuf_total % 8) {
+ netif_err(wp, drv, wp->dev,
+ "Illegal memory size register setting.\n");
+
+ return -EINVAL;
+ }
+
+ for (i = 0; i < 4; ++i) {
+ rx_size = (wp->rxbuf_conf[2*i] << 8) | wp->rxbuf_conf[2*i+1];
+ tx_size = (wp->txbuf_conf[2*i] << 8) | wp->txbuf_conf[2*i+1];
+
+ w5300_write(wp, RMSR + 2*i, rx_size);
+ w5300_write(wp, TMSR + 2*i, tx_size);
+ }
+
+ /* Setting FIFO Memory Type (TX&RX) */
+ for (i = 0; i < txbuf_total / 8; ++i) {
+ mem_cfg <<= 1;
+ mem_cfg |= 1;
+ }
+ w5300_write(wp, MTYPER, mem_cfg);
+
+ /* Masking all interrupts */
+ w5300_write(wp, IMR, 0x0000);
+
+ return 0;
+}
+
+/* Interrupt Handler(ISR) */
+static irqreturn_t wiz_interrupt(int irq, void *dev_instance)
+{
+ struct net_device *dev = dev_instance;
+ struct wiz_private *wp = netdev_priv(dev);
+ int timeout = 100;
+ u16 isr, ssr;
+ int s;
+
+ isr = w5300_read(wp, IR);
+
+ /* Completing all interrupts at a time. */
+ while (isr && timeout--) {
+ w5300_write(wp, IR, isr);
+
+ /* Finding the channel to create the interrupt */
+ s = find_first_bit((ulong *)&isr, sizeof(u16));
+ ssr = w5300_read(wp, Sn_IR(s));
+ /* socket interrupt is cleared. */
+ w5300_write(wp, Sn_IR(s), ssr);
+
+ netif_dbg(wp, intr, wp->dev,
+ "ISR = %X, SSR = %X, s = %X\n",
+ isr, ssr, s);
+
+ if (likely(!s)) {
+ if (ssr & Sn_IR_RECV) {
+ /* Interrupt disable. */
+ w5300_interrupt_disable(wp);
+ /* Receiving by polling method */
+ napi_schedule(&wp->napi);
+ }
+ }
+
+ /* Is there any interrupt to be processed? */
+ isr = w5300_read(wp, IR);
+ }
+
+ return IRQ_HANDLED;
+}
+
+static int wiz_open(struct net_device *dev)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+ int ret;
+
+ napi_enable(&wp->napi);
+
+ ret = request_irq(dev->irq, wiz_interrupt, IRQF_SHARED,
+ dev->name, dev);
+ if (ret < 0) {
+ netif_err(wp, ifup, wp->dev, "request_irq() error!\n");
+ return ret;
+ }
+
+ w5300_interrupt_enable(wp);
+
+ /* Sending OPEN command to use channel 0 as MACRAW mode. */
+ ret = w5300_channel_open(wp, Sn_MR_MACRAW_MF);
+ if (ret < 0) {
+ netif_err(wp, ifup, wp->dev, "w5300 channel open fail!\n");
+ return ret;
+ }
+
+ netif_carrier_on(dev);
+ netif_start_queue(dev);
+
+ return 0;
+}
+
+static int wiz_close(struct net_device *dev)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+ int timeout = 1000;
+
+ napi_disable(&wp->napi);
+ netif_carrier_off(dev);
+
+ /* Interrupt masking of all channels */
+ w5300_write(wp, IMR, 0x0000);
+ w5300_write(wp, Sn_CR(0), Sn_CR_CLOSE);
+
+ while (timeout--) {
+ if (!w5300_read(wp, Sn_CR(0)))
+ break;
+ udelay(1);
+ }
+
+ free_irq(dev->irq, dev);
+
+ return 0;
+}
+
+static int w5300_send_data(struct wiz_private *wp, u8 *buf, ssize_t len)
+{
+ int i;
+ u16 send_data;
+ int timeout = 1000;
+
+ /* Writing packets in to Tx FIFO */
+ for (i = 0; i < len; i += 2) {
+ send_data = (buf[i] << 8) | buf[i+1];
+ w5300_write(wp, Sn_TX_FIFO(0), send_data);
+ }
+
+ w5300_write(wp, Sn_TX_WRSR(0), (u16)(len >> 16));
+ w5300_write(wp, Sn_TX_WRSR(0) + 2, (u16)len);
+ w5300_write(wp, Sn_CR(0), Sn_CR_SEND);
+
+ while (timeout--) {
+ if (!w5300_read(wp, Sn_CR(0)))
+ return len;
+ udelay(1);
+ }
+
+ return -EBUSY;
+}
+
+/* Function to transmit data at the MACRAW mode */
+static int wiz_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+ int ret;
+
+ ret = w5300_send_data(wp, skb->data, skb->len);
+
+ /* Statistical Process */
+ if (ret < 0) {
+ wp->stats.tx_dropped++;
+ } else {
+ wp->stats.tx_bytes += skb->len;
+ wp->stats.tx_packets++;
+ dev->trans_start = jiffies;
+ netif_dbg(wp, tx_done, wp->dev,
+ "tx done, packet size = %d\n", skb->len);
+ }
+ dev_kfree_skb(skb);
+
+ return NETDEV_TX_OK;
+}
+
+static struct net_device_stats *wiz_get_stats(struct net_device *dev)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+
+ return &wp->stats;
+}
+
+/* It is called when multi-cast list or flag is changed. */
+static void wiz_set_multicast(struct net_device *dev)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+ int ret;
+ u32 type = dev->flags & IFF_PROMISC ? Sn_MR_MACRAW : Sn_MR_MACRAW_MF;
+
+ ret = w5300_channel_open(wp, type);
+ if (ret < 0) {
+ netif_err(wp, ifup, wp->dev,
+ "w5300 channel open fail!\n");
+ }
+}
+
+static int wiz_set_mac_address(struct net_device *dev, void *addr)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+ struct sockaddr *sock_addr = addr;
+
+ netif_dbg(wp, drv, wp->dev, "set mac address");
+
+ spin_lock(&wp->lock);
+ w5300_set_macaddr(wp, sock_addr->sa_data);
+ memcpy(dev->dev_addr, sock_addr->sa_data, dev->addr_len);
+ spin_unlock(&wp->lock);
+
+ return 0;
+}
+
+static void wiz_tx_timeout(struct net_device *dev)
+{
+ struct wiz_private *wp = netdev_priv(dev);
+ unsigned long flags;
+
+ netif_dbg(wp, timer, wp->dev, "Transmit timeout");
+
+ spin_lock_irqsave(&wp->lock, flags);
+
+ /* Initializing W5300 chip. */
+ if (w5300_reset(dev) < 0) {
+ netif_err(wp, timer, wp->dev, "w5300 reset fail!\n");
+ return;
+ }
+
+ /* Waking up network interface */
+ netif_wake_queue(dev);
+ spin_unlock_irqrestore(&wp->lock, flags);
+}
+
+/*
+ * Polling Function to process only receiving at the MACRAW mode.
+ * De-activating the interrupt when recv interrupt occurs,
+ * and processing the RECEIVE with this Function
+ * Activating the interrupt after completing RECEIVE process
+ * As recv interrupt often occurs at short intervals,
+ * there will system load in case that interrupt handler process the RECEIVE.
+ */
+static int wiz_rx_poll(struct napi_struct *napi, int budget)
+{
+ struct wiz_private *wp = container_of(napi, struct wiz_private, napi);
+ struct net_device *dev = wp->dev;
+ int npackets = 0;
+
+ /* Processing the RECEIVE during Rx FIFO is containing any packet */
+ while (w5300_get_rxsize(wp, 0) > 0) {
+ struct sk_buff *skb;
+ u16 rxbuf_len, pktlen;
+ u32 crc;
+
+ /* The first 2byte is the information about packet lenth. */
+ w5300_recv_data(wp, 0, (u8 *)&pktlen, 2);
+ pktlen = be16_to_cpu(pktlen);
+
+ netif_dbg(wp, rx_err, wp->dev, "pktlen = %d\n", pktlen);
+
+ /*
+ * Allotting the socket buffer in which packet will be contained
+ * Ethernet packet is of 14byte.
+ * In order to make it multiplied by 2, the buffer allocation
+ * should be 2bytes bigger than the packet.
+ */
+ skb = netdev_alloc_skb_ip_align(dev, pktlen);
+ if (!skb) {
+ u8 temp[pktlen + 4];
+ wp->stats.rx_dropped++;
+ w5300_recv_data(wp, 0, temp, pktlen + 4);
+ continue;
+ }
+
+ /* Initializing the socket buffer */
+ skb->dev = dev;
+ skb_reserve(skb, 2);
+ skb_put(skb, pktlen);
+
+ /* Reading packets from W5300 Rx FIFO into socket buffer. */
+ w5300_recv_data(wp, 0, (u8 *)skb->data, pktlen);
+
+ /* Reading and discarding 4byte CRC. */
+ w5300_recv_data(wp, 0, (u8 *)&crc, 4);
+ crc = be32_to_cpu(crc);
+
+ /* The packet type is Ethernet. */
+ skb->protocol = eth_type_trans(skb, dev);
+
+ /* Passing packets to uppder stack (kernel). */
+ netif_receive_skb(skb);
+
+ /* Processing statistical information */
+ wp->stats.rx_packets++;
+ wp->stats.rx_bytes += pktlen;
+ wp->dev->last_rx = jiffies;
+ rxbuf_len -= pktlen;
+ npackets++;
+
+ if (npackets >= budget)
+ break;
+ }
+
+ /* If packet number is smaller than budget when getting out of loopback,
+ * the RECEIVE process is completed. */
+ if (npackets < budget) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&wp->lock, flags);
+ w5300_interrupt_enable(wp);
+ __napi_complete(napi);
+ spin_unlock_irqrestore(&wp->lock, flags);
+ }
+ return npackets;
+}
+
+static const struct net_device_ops wiz_netdev_ops = {
+ .ndo_open = wiz_open,
+ .ndo_stop = wiz_close,
+ .ndo_validate_addr = eth_validate_addr,
+ .ndo_set_mac_address = wiz_set_mac_address,
+ .ndo_set_rx_mode = wiz_set_multicast,
+ .ndo_get_stats = wiz_get_stats,
+ .ndo_start_xmit = wiz_start_xmit,
+ .ndo_tx_timeout = wiz_tx_timeout,
+};
+
+/* Initialize W5300 driver. */
+static int __devinit w5300_drv_probe(struct platform_device *pdev)
+{
+ struct net_device *dev;
+ struct wiz_private *wp;
+ struct resource *res;
+ void __iomem *addr;
+ int ret;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ /* Request the chip register regions. */
+ if (!request_mem_region(res->start, resource_size(res), DEV_NAME)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ /* Allocatting struct net_device structure which is managing W5300 */
+ dev = alloc_etherdev(sizeof(struct wiz_private));
+ if (!dev) {
+ ret = -ENOMEM;
+ goto release_region;
+ }
+
+ dev->dma = (unsigned char)-1;
+ dev->irq = platform_get_irq(pdev, 0);
+ wp = netdev_priv(dev);
+ wp->dev = dev;
+ wp->msg_enable = (debug < 0 ? W5300_DEF_MSG_ENABLE : debug);
+ addr = ioremap(res->start, SZ_1M);
+ if (!addr) {
+ ret = -ENOMEM;
+ goto release_both;
+ }
+
+ platform_set_drvdata(pdev, dev);
+ wp->base = addr;
+
+ spin_lock_init(&wp->lock);
+
+ /* Initialization of Rx/Tx FIFO size */
+ memcpy(wp->rxbuf_conf, w5300_rxbuf_conf, MAX_SOCK_NUM);
+ memcpy(wp->txbuf_conf, w5300_txbuf_conf, MAX_SOCK_NUM);
+
+ dev->base_addr = res->start;
+
+ memcpy(dev->dev_addr, w5300_defmac, dev->addr_len);
+ dev->netdev_ops = &wiz_netdev_ops;
+
+ /* Setting napi. Enabling to process max 16 packets at a time. */
+ netif_napi_add(dev, &wp->napi, wiz_rx_poll, 16);
+
+ dev->watchdog_timeo = msecs_to_jiffies(watchdog);
+
+ ret = w5300_reset(dev);
+ if (ret < 0)
+ goto release_both;
+
+ ret = register_netdev(dev);
+ if (ret != 0) {
+ platform_set_drvdata(pdev, NULL);
+ iounmap(addr);
+release_both:
+ free_netdev(dev);
+release_region:
+ release_mem_region(res->start, resource_size(res));
+ }
+out:
+ return ret;
+}
+
+static int __devexit w5300_drv_remove(struct platform_device *pdev)
+{
+ struct net_device *dev = platform_get_drvdata(pdev);
+ struct wiz_private *wp = netdev_priv(dev);
+ struct resource *res;
+
+ platform_set_drvdata(pdev, NULL);
+ unregister_netdev(dev);
+
+ iounmap(wp->base);
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (res != NULL)
+ release_mem_region(res->start, resource_size(res));
+
+ free_netdev(dev);
+
+ return 0;
+}
+
+#ifdef CONFIG_PM
+
+static int w5300_drv_suspend(struct platform_device *pdev, pm_message_t state)
+{
+ struct net_device *dev = platform_get_drvdata(pdev);
+
+ if (dev) {
+ struct wiz_private *wp = netdev_priv(dev);
+
+ if (netif_running(dev)) {
+ int timeout = 1000;
+
+ netif_carrier_off(dev);
+ netif_device_detach(dev);
+ w5300_write(wp, IMR, 0x0000);
+ w5300_write(wp, Sn_CR(0), Sn_CR_CLOSE);
+
+ while (timeout--) {
+ if (!w5300_read(wp, Sn_CR(0)))
+ return 0;
+ udelay(1);
+ }
+ return -EBUSY;
+ }
+ }
+ return 0;
+}
+
+static int w5300_drv_resume(struct platform_device *pdev)
+{
+ struct net_device *dev = platform_get_drvdata(pdev);
+ int ret = 0;
+
+ if (dev) {
+ struct wiz_private *wp = netdev_priv(dev);
+
+ if (netif_running(dev)) {
+ ret = w5300_reset(dev);
+ if (ret < 0)
+ goto out;
+
+ w5300_interrupt_enable(wp);
+
+ ret = w5300_channel_open(wp, Sn_MR_MACRAW_MF);
+ if (ret < 0)
+ goto out;
+
+ netif_carrier_on(dev);
+ netif_device_attach(dev);
+ }
+ }
+
+out:
+ return ret;
+}
+#endif /* CONFIG_PM */
+
+static struct platform_driver w5300_driver = {
+ .driver = {
+ .name = DEV_NAME,
+ .owner = THIS_MODULE,
+ },
+ .probe = w5300_drv_probe,
+ .remove = __devexit_p(w5300_drv_remove),
+#ifdef CONFIG_PM
+ .suspend = w5300_drv_suspend,
+ .resume = w5300_drv_resume,
+#endif
+};
+
+static int __init wiz_module_init(void)
+{
+ return platform_driver_register(&w5300_driver);
+}
+
+static void __exit wiz_module_exit(void)
+{
+ platform_driver_unregister(&w5300_driver);
+}
+
+module_init(wiz_module_init);
+module_exit(wiz_module_exit);
diff --git a/drivers/net/ethernet/wiznet/w5300.h b/drivers/net/ethernet/wiznet/w5300.h
new file mode 100644
index 0000000..bb6e181
--- /dev/null
+++ b/drivers/net/ethernet/wiznet/w5300.h
@@ -0,1 +1,121 @@
+#ifndef _W5300_H_
+#define _W5300_H_
+
+/* Maximum socket number. W5300 supports max 8 channels. */
+#define MAX_SOCK_NUM 8
+
+/* socket register */
+#define CH_BASE (0x200)
+
+/* size of each channel register map */
+#define CH_SIZE 0x40
+
+#define MR (0) /**< Mode register */
+#define IR (0x02) /**< Interrupt register */
+#define IMR (0x04) /**< Interrupt mask register */
+#define SHAR (0x08) /**< Source MAC register address */
+#define TMSR (0x20) /**< Transmit memory size register */
+#define RMSR (0x28) /**< Receive memory size register */
+
+/*
+ * Memory Type Register
+ * '1' - TX memory
+ * '0' - RX memory
+ */
+#define MTYPER (0x30)
+
+/* Chip ID register(=0x5300) */
+#define IDR (0xFE)
+#define IDR1 (IDR + 1)
+
+/* socket Mode register */
+#define Sn_MR(ch) (CH_BASE + ch * CH_SIZE + 0x00)
+#define Sn_MR1(ch) (Sn_MR(ch)+1)
+
+/* socket command register */
+#define Sn_CR(ch) (CH_BASE + ch * CH_SIZE + 0x02)
+#define Sn_CR1(ch) (Sn_CR(ch)+1);
+
+/* socket interrupt register */
+#define Sn_IR(ch) (CH_BASE + ch * CH_SIZE + 0x06)
+
+/* Transmit Size Register (Byte count) */
+#define Sn_TX_WRSR(ch) (CH_BASE + ch * CH_SIZE + 0x20)
+
+/* Transmit free memory size register (Byte count) */
+#define Sn_TX_FSR(ch) (CH_BASE + ch * CH_SIZE + 0x24)
+
+/* Received data size register (Byte count) */
+#define Sn_RX_RSR(ch) (CH_BASE + ch * CH_SIZE + 0x28)
+
+/* FIFO register for Transmit */
+#define Sn_TX_FIFO(ch) (CH_BASE + ch * CH_SIZE + 0x2E)
+
+/* FIFO register for Receive */
+#define Sn_RX_FIFO(ch) (CH_BASE + ch * CH_SIZE + 0x30)
+
+/* MODE register values */
+#define MR_DBW (1 << 15) /**< Data bus width bit of MR. */
+#define MR_MPF (1 << 14) /**< Mac layer pause frame bit of MR. */
+#define MR_WDF(x) ((x & 0x07) << 11) /**< Write data fetch time bit of MR. */
+#define MR_RDH (1 << 10) /**< Read data hold time bit of MR. */
+#define MR_FS (1 << 8) /**< FIFO swap bit of MR. */
+#define MR_RST (1 << 7) /**< S/W reset bit of MR. */
+#define MR_MT (1 << 5) /**< Memory test bit of MR. */
+#define MR_PB (1 << 4) /**< Ping block bit of MR. */
+#define MR_PPPoE (1 << 3) /**< PPPoE bit of MR. */
+#define MR_DBS (1 << 2) /**< Data bus swap of MR. */
+#define MR_IND (1 << 0) /**< Indirect mode bit of MR. */
+
+/* IR register values */
+#define IR_IPCF (1 << 7) /**< IP conflict bit of IR. */
+#define IR_DPUR (1 << 6) /**< Destination port unreachable bit of IR. */
+#define IR_PPPT (1 << 5) /**< PPPoE terminate bit of IR. */
+#define IR_FMTU (1 << 4) /**< Fragment MTU bit of IR. */
+#define IR_SnINT(n) (0x01 << n) /**< SOCKETn interrupt occurrence bit of IR. */
+
+/* Sn_MR values */
+#define Sn_MR_ALIGN (1 << 8) /**< Alignment bit of Sn_MR. */
+#define Sn_MR_MULTI (1 << 7) /**< Multicasting bit of Sn_MR. */
+#define Sn_MR_MF (1 << 6) /**< MAC filter bit of Sn_MR. */
+#define Sn_MR_IGMPv (1 << 5) /**< IGMP version bit of Sn_MR. */
+#define Sn_MR_ND (1 << 5) /**< No delayed ack bit of Sn_MR. */
+#define Sn_MR_CLOSE 0x00 /**< Protocol bits of Sn_MR. */
+#define Sn_MR_TCP 0x01 /**< Protocol bits of Sn_MR. */
+#define Sn_MR_UDP 0x02 /**< Protocol bits of Sn_MR. */
+#define Sn_MR_IPRAW 0x03 /**< Protocol bits of Sn_MR. */
+#define Sn_MR_MACRAW 0x04 /**< Protocol bits of Sn_MR. */
+#define Sn_MR_MACRAW_MF 0x44 /**< Protocol bits of Sn_MR */
+#define Sn_MR_PPPoE 0x05 /**< Protocol bits of Sn_MR. */
+
+/* Sn_CR values */
+#define Sn_CR_OPEN 0x01 /**< OPEN command value of Sn_CR. */
+#define Sn_CR_LISTEN 0x02 /**< LISTEN command value of Sn_CR. */
+#define Sn_CR_CONNECT 0x04 /**< CONNECT command value of Sn_CR. */
+#define Sn_CR_DISCON 0x08 /**< DISCONNECT command value of Sn_CR. */
+#define Sn_CR_CLOSE 0x10 /**< CLOSE command value of Sn_CR. */
+#define Sn_CR_SEND 0x20 /**< SEND command value of Sn_CR. */
+#define Sn_CR_SEND_MAC 0x21 /**< SEND_MAC command value of Sn_CR. */
+#define Sn_CR_SEND_KEEP 0x22 /**< SEND_KEEP command value of Sn_CR */
+#define Sn_CR_RECV 0x40 /**< RECV command value of Sn_CR */
+#define Sn_CR_PCON 0x23 /**< PCON command value of Sn_CR */
+#define Sn_CR_PDISCON 0x24 /**< PDISCON command value of Sn_CR */
+#define Sn_CR_PCR 0x25 /**< PCR command value of Sn_CR */
+#define Sn_CR_PCN 0x26 /**< PCN command value of Sn_CR */
+#define Sn_CR_PCJ 0x27 /**< PCJ command value of Sn_CR */
+
+/* Sn_IR values */
+#define Sn_IR_PRECV 0x80 /**< PPP receive bit of Sn_IR */
+#define Sn_IR_PFAIL 0x40 /**< PPP fail bit of Sn_IR */
+#define Sn_IR_PNEXT 0x20 /**< PPP next phase bit of Sn_IR */
+#define Sn_IR_SENDOK 0x10 /**< Send OK bit of Sn_IR */
+#define Sn_IR_TIMEOUT 0x08 /**< Timout bit of Sn_IR */
+#define Sn_IR_RECV 0x04 /**< Receive bit of Sn_IR */
+#define Sn_IR_DISCON 0x02 /**< Disconnect bit of Sn_IR */
+#define Sn_IR_CON 0x01 /**< Connect bit of Sn_IR */
+
+/* W5300 Register READ/WRITE funtions(Just 16 bit interface). */
+#define w5300_write(wp, addr, val) writew(val, (wp->base + addr))
+#define w5300_read(wp, addr) readw((wp->base + addr))
+
+#endif /* _W5300_H_ */
--
1.7.1
^ permalink raw reply related
* Re: [BUG] bonding : LOCKDEP warning
From: Eric W. Biederman @ 2011-10-22 8:43 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, David Miller
In-Reply-To: <1319268987.6180.21.camel@edumazet-laptop>
Eric Dumazet <eric.dumazet@gmail.com> writes:
> On latest net-next I got following splat
Eric it is going to be a little bit before I can test this but I believe
we just need the one fix in the patch below. Can you verify this fixes
your lockdep issue.
Thanks,
Eric
>From 60aeafd8976a1117e118574ada44a79b69c75e70 Mon Sep 17 00:00:00 2001
From: Eric W. Biederman <ebiederm@xmission.com>
Date: Sat, 22 Oct 2011 01:36:18 -0700
Subject: [PATCH] bonding: Add a forgetten sysfs_attr_init on class_attr_bonding_masters
When I made class_attr_bonding_matters per network namespace and dynamically
allocated I overlooked the need for calling sysfs_attr_init. Oops.
This fixes the following lockdep splat:
[ 5.749651] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[ 5.749655] bonding: MII link monitoring set to 100 ms
[ 5.749676] BUG: key f49a831c not in .data!
[ 5.749677] ------------[ cut here ]------------
[ 5.749752] WARNING: at kernel/lockdep.c:2897 lockdep_init_map+0x1c3/0x460()
[ 5.749809] Hardware name: ProLiant BL460c G1
[ 5.749862] Modules linked in: bonding(+)
[ 5.749978] Pid: 3177, comm: modprobe Not tainted 3.1.0-rc9-02177-gf2d1a4e-dirty #1157
[ 5.750066] Call Trace:
[ 5.750120] [<c1352c2f>] ? printk+0x18/0x21
[ 5.750176] [<c103112d>] warn_slowpath_common+0x6d/0xa0
[ 5.750231] [<c1060133>] ? lockdep_init_map+0x1c3/0x460
[ 5.750287] [<c1060133>] ? lockdep_init_map+0x1c3/0x460
[ 5.750342] [<c103117d>] warn_slowpath_null+0x1d/0x20
[ 5.750398] [<c1060133>] lockdep_init_map+0x1c3/0x460
[ 5.750453] [<c1355ddd>] ? _raw_spin_unlock+0x1d/0x20
[ 5.750510] [<c11255c8>] ? sysfs_new_dirent+0x68/0x110
[ 5.750565] [<c1124d4b>] sysfs_add_file_mode+0x8b/0xe0
[ 5.750621] [<c1124db3>] sysfs_add_file+0x13/0x20
[ 5.750675] [<c1124e7c>] sysfs_create_file+0x1c/0x20
[ 5.750737] [<c1208f09>] class_create_file+0x19/0x20
[ 5.750794] [<c12c186f>] netdev_class_create_file+0xf/0x20
[ 5.750853] [<f85deaf4>] bond_create_sysfs+0x44/0x90 [bonding]
[ 5.750911] [<f8410947>] ? bond_create_proc_dir+0x1e/0x3e [bonding]
[ 5.750970] [<f841007e>] bond_net_init+0x7e/0x87 [bonding]
[ 5.751026] [<f8410000>] ? 0xf840ffff
[ 5.751080] [<c12abc7a>] ops_init.clone.4+0xba/0x100
[ 5.751135] [<c12abdb2>] ? register_pernet_subsys+0x12/0x30
[ 5.751191] [<c12abd03>] register_pernet_operations.clone.3+0x43/0x80
[ 5.751249] [<c12abdb9>] register_pernet_subsys+0x19/0x30
[ 5.751306] [<f84108b9>] bonding_init+0x832/0x8a2 [bonding]
[ 5.751363] [<c10011f0>] do_one_initcall+0x30/0x160
[ 5.751420] [<f8410087>] ? bond_net_init+0x87/0x87 [bonding]
[ 5.751477] [<c106d5cf>] sys_init_module+0xef/0x1890
[ 5.751533] [<c1356490>] sysenter_do_call+0x12/0x36
[ 5.751588] ---[ end trace 89f492d83a7f5006 ]---
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
drivers/net/bonding/bond_sysfs.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 6044ff8..5a20804 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1675,6 +1675,7 @@ int bond_create_sysfs(struct bond_net *bn)
int ret;
bn->class_attr_bonding_masters = class_attr_bonding_masters;
+ sysfs_attr_init(&bn->class_attr_bonding_masters.attr);
ret = netdev_class_create_file(&bn->class_attr_bonding_masters);
/*
--
1.7.2.5
^ permalink raw reply related
* Re: [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt
From: David Täht @ 2011-10-22 9:01 UTC (permalink / raw)
To: Maciej Żenczykowski; +Cc: David Miller, netdev
In-Reply-To: <CAHo-OoyVSbsxb8U3Y5WCNRsxjr00g1O3HJcT1fmu5cmP5i-JsA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6481 bytes --]
On 10/22/2011 10:27 AM, Maciej Żenczykowski wrote:
>> I also don't see why we'd want to allow disabling this either.
I have been watching this and the other capability patches go by with
interest. My use case is that I would like to be running "named" as a
non-root user, but would like it to vary the dscp (tos) field on a per
connection basis.
tcp zone transfers = bulk
tcp/udp queries = something like interactive | CS5 (this moves dns
queries into the VI queue on wireless - which can also be done with
SO_PRIORITY)
Having TOS modification as a grant-able capability and otherwise
restricting it makes some sense in a world of otherwise unrestricted
user programs in the clouds, however I note that setting CS1, reducing
something from best effort to background, should also be allowed
universally.
I note that another way to hammer down someone elses (guest machine,
external router, etc) TOS settings would be to do it in iptables, but to
do it on a fine grained basis at present would take up to 63 iptables
rules...
lastly...
The skb->priority field needs some re-thought. In the case of wireless,
it selects a different tx queue based on magic (see net/wireless/utils.c)
/* skb->priority values from 256->263 are magic values to
* directly indicate a specific 802.1d priority. This is used
* to allow 802.1d priority to be passed directly in from VLAN
* tags, etc.
*/
if (skb->priority >= 256 && skb->priority <= 263)
return skb->priority - 256;
classification is an aristotelian rathole!
>> I really hate these patches that offer ways to disable things
>> that normally work, and thus break apps when the non-default
>> is selected.
> Well... the purpose of settings like this is precisely to break functionality
> when the default is not set ;-)
>
>> I kind of have a feeling the kind of situation you're trying to
>> account for, you have some cloud where people run random stuff
>> that you don't control.
> Yes, I have control of the kernel, I have control of root, I have control of
> some daemons that are running on the machine, but I don't really have
> control of the entirety of userspace, some of it I have source code for
> and could audit to guarantee correctness (but I can't really enforce
> that on the users, ultimately they can run any binary),
> and for some of it I don't even have that. Either way, it's much
> easier to delegate setting policy to
> userspace management daemon(s), and leave enforcing it to the kernel.
> This is just one more such knob.
>
>> But you didn't specify this, and we just have to guess. Why don't you
>> describe the specific situation where you want to modify this setting?
>> Please do this instead of just talking about what the side effects are
>> inside of the kernel. That's much less interesting when it comes to
>> patches like this.
> Very well, that's a good point.
>
> Here's an attempt to provide some insight.
>
> I am attempting to allow not-fully-code-audited nor fully trusted apps to run
> in a cgroup containerized environment, with many apps in many
> containers (not 1:1, has hierarchies) on a single kernel.
> The apps are in the believed to not be actively malicious class, but
> very likely to be buggy, or written by ill-advised programmers based
> on wrong/outdated or otherwise incorrect documentation. I cannot rely
> on unprivileged userspace getting things right.
> I have to have some mechanism to grant these apps permissions to
> utilize specific levels of network fabric priority. For this I have
> the aforementioned per-cgroup allowed TOS settings. VLANs are not appropriate
> because a client with high priority net privs is allowed to send a
> request to a server with no special priority permissions.
> (there are further patches to support tcp tos reflection so the server
> can automatically respond with the client's priority)
>
> Multiqueue networking combined with hardware priority queues and xps
> desires to use skb->priority + active cpu for tx queue selection.
> In this particular case TX queue selection should happen based on the
> TOS priority.
> Setting TOS automatically sets sk_priority (and hence skb->priority).
> So all's good, so long as userspace doesn't go and change the
> sk_priority field via SO_PRIORITY and break the mapping.
>
> As a further note:
>
> Some of these apps may be a little more special, a little more
> audited, and a little more trusted.
> Enough so that they might be granted CAP_NET_RAW, but not enough so
> that they can get CAP_NET_ADMIN.
> Hence the general desire for CAP_NET_ADMIN to control general
> machine-global networking state, but not have it control
> per-socket or per-packet settings. ie. bringing up or down an
> interface affects everyone (hence must be CAP_NET_ADMIN, and much more
> tightly controlled), while spoofing a packet doesn't really negatively
> affect anyone (you can't assume the network is trusted, so there can
> be
> external sources of spoofing or eavesdropping anyway).
>
> ---
>
> I could attempt to publish the vast majority of our internal
> networking code base (there isn't really anything secret in there),
> but it's based on 2.6.34 and even after two years of attempting to
> clean it up and refactor it (along with a rebase from 2.6.26, and all
> while actively continuing development) I'm still not at the point were
> I would consider this to be a particular useful course of action
> (there's a lot of bugfixes of bugfixes of crappy patches in there,
> plus hacks, plus tons of backports from upstream, and tons of code
> which is upstream but slightly differently then we have it internally,
> because we had it first, and pushed v2 upstream, etc...). Instead I'm
> trying to get the easy hanging fruit out of the way, rebase our
> patches onto probably 3.2 or 3.3, likely sending some more your way
> during the process, and see where that leaves us. Basically trying to
> reduce the delta. We will always have internal only patches, but the
> fewer, the less burden for us, hence I'm trying to get the ones I
> believe to be potentially useful externally upstreamed. Obviously
> whatever patches you don't accept, we'll still keep around locally.
>
> Maciej
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Dave Täht
[-- Attachment #2: dave_taht.vcf --]
[-- Type: text/x-vcard, Size: 214 bytes --]
begin:vcard
fn;quoted-printable:Dave T=C3=A4ht
n;quoted-printable:T=C3=A4ht;Dave
email;internet:dave.taht@gmail.com
tel;home:1-239-829-5608
tel;cell:0638645374
x-mozilla-html:FALSE
version:2.1
end:vcard
^ permalink raw reply
* Re: [BUG] bonding : LOCKDEP warning
From: Eric Dumazet @ 2011-10-22 9:05 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, David Miller
In-Reply-To: <m14nz1l8mc.fsf@fess.ebiederm.org>
Le samedi 22 octobre 2011 à 01:43 -0700, Eric W. Biederman a écrit :
> Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> > On latest net-next I got following splat
>
> Eric it is going to be a little bit before I can test this but I believe
> we just need the one fix in the patch below. Can you verify this fixes
> your lockdep issue.
>
> Thanks,
>
> Eric
>
>
> From 60aeafd8976a1117e118574ada44a79b69c75e70 Mon Sep 17 00:00:00 2001
> From: Eric W. Biederman <ebiederm@xmission.com>
> Date: Sat, 22 Oct 2011 01:36:18 -0700
> Subject: [PATCH] bonding: Add a forgetten sysfs_attr_init on class_attr_bonding_masters
>
> When I made class_attr_bonding_matters per network namespace and dynamically
> allocated I overlooked the need for calling sysfs_attr_init. Oops.
>
> This fixes the following lockdep splat:
>
> [ 5.749651] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
> [ 5.749655] bonding: MII link monitoring set to 100 ms
> [ 5.749676] BUG: key f49a831c not in .data!
> [ 5.749677] ------------[ cut here ]------------
> [ 5.749752] WARNING: at kernel/lockdep.c:2897 lockdep_init_map+0x1c3/0x460()
> [ 5.749809] Hardware name: ProLiant BL460c G1
> [ 5.749862] Modules linked in: bonding(+)
> [ 5.749978] Pid: 3177, comm: modprobe Not tainted 3.1.0-rc9-02177-gf2d1a4e-dirty #1157
> [ 5.750066] Call Trace:
> [ 5.750120] [<c1352c2f>] ? printk+0x18/0x21
> [ 5.750176] [<c103112d>] warn_slowpath_common+0x6d/0xa0
> [ 5.750231] [<c1060133>] ? lockdep_init_map+0x1c3/0x460
> [ 5.750287] [<c1060133>] ? lockdep_init_map+0x1c3/0x460
> [ 5.750342] [<c103117d>] warn_slowpath_null+0x1d/0x20
> [ 5.750398] [<c1060133>] lockdep_init_map+0x1c3/0x460
> [ 5.750453] [<c1355ddd>] ? _raw_spin_unlock+0x1d/0x20
> [ 5.750510] [<c11255c8>] ? sysfs_new_dirent+0x68/0x110
> [ 5.750565] [<c1124d4b>] sysfs_add_file_mode+0x8b/0xe0
> [ 5.750621] [<c1124db3>] sysfs_add_file+0x13/0x20
> [ 5.750675] [<c1124e7c>] sysfs_create_file+0x1c/0x20
> [ 5.750737] [<c1208f09>] class_create_file+0x19/0x20
> [ 5.750794] [<c12c186f>] netdev_class_create_file+0xf/0x20
> [ 5.750853] [<f85deaf4>] bond_create_sysfs+0x44/0x90 [bonding]
> [ 5.750911] [<f8410947>] ? bond_create_proc_dir+0x1e/0x3e [bonding]
> [ 5.750970] [<f841007e>] bond_net_init+0x7e/0x87 [bonding]
> [ 5.751026] [<f8410000>] ? 0xf840ffff
> [ 5.751080] [<c12abc7a>] ops_init.clone.4+0xba/0x100
> [ 5.751135] [<c12abdb2>] ? register_pernet_subsys+0x12/0x30
> [ 5.751191] [<c12abd03>] register_pernet_operations.clone.3+0x43/0x80
> [ 5.751249] [<c12abdb9>] register_pernet_subsys+0x19/0x30
> [ 5.751306] [<f84108b9>] bonding_init+0x832/0x8a2 [bonding]
> [ 5.751363] [<c10011f0>] do_one_initcall+0x30/0x160
> [ 5.751420] [<f8410087>] ? bond_net_init+0x87/0x87 [bonding]
> [ 5.751477] [<c106d5cf>] sys_init_module+0xef/0x1890
> [ 5.751533] [<c1356490>] sysenter_do_call+0x12/0x36
> [ 5.751588] ---[ end trace 89f492d83a7f5006 ]---
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
> drivers/net/bonding/bond_sysfs.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
> index 6044ff8..5a20804 100644
> --- a/drivers/net/bonding/bond_sysfs.c
> +++ b/drivers/net/bonding/bond_sysfs.c
> @@ -1675,6 +1675,7 @@ int bond_create_sysfs(struct bond_net *bn)
> int ret;
>
> bn->class_attr_bonding_masters = class_attr_bonding_masters;
> + sysfs_attr_init(&bn->class_attr_bonding_masters.attr);
>
> ret = netdev_class_create_file(&bn->class_attr_bonding_masters);
> /*
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Thanks a lot Eric !
^ permalink raw reply
* Re: [BUG] bonding : LOCKDEP warning
From: David Miller @ 2011-10-22 9:09 UTC (permalink / raw)
To: eric.dumazet; +Cc: ebiederm, netdev
In-Reply-To: <1319274349.6180.23.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 22 Oct 2011 11:05:49 +0200
> Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Patchwork says "parse error"
I'll fix it up this time, but please do not use free form
tags like this in the future. Thanks.
^ permalink raw reply
* Re: [BUG] bonding : LOCKDEP warning
From: Eric Dumazet @ 2011-10-22 9:19 UTC (permalink / raw)
To: David Miller; +Cc: ebiederm, netdev
In-Reply-To: <20111022.050914.551507702374659667.davem@davemloft.net>
Le samedi 22 octobre 2011 à 05:09 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 22 Oct 2011 11:05:49 +0200
>
> > Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Patchwork says "parse error"
>
> I'll fix it up this time, but please do not use free form
> tags like this in the future. Thanks.
Strange, it seems quite usual these days, you're the first one to
complain. Maybe compain to Patchwork ?
Reported-and-tested-by: Shlomo Pongratz <shlomop@mellanox.com>
Reported-and-tested-by: Simon Kirby <sim@hostway.ca>
Reported-and-tested-by: Amir Vadai <amirv@dev.mellanox.co.il>
Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com>
Reported-and-tested-by: Rocko Requin <rockorequin@hotmail.com>
Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com>
Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Reported-and-tested-by: Hor Jiun Shyong <jiunshyong@gmail.com>
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-and-tested-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reported-and-tested-by: Jan Teichmann <jan.teichmann@gmail.com>
Reported-and-tested-by: Arnaud Lacombe <lacombar@gmail.com>
Reported-and-tested-by: Jim Bray <jimsantelmo@gmail.com>
Reported-and-tested-by: Muhammad Khurram Khan
Reported-and-tested-by: Matej Laitl <matej@laitl.cz>
Reported-and-tested-by: Thomas Seilund <tps@netmaster.dk>
Reported-and-tested-by: René Fritz <rene@colorcube.de>
Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-and-tested-by: William Light <wrl@illest.net>
Reported-and-tested-by: Xiaotian Feng <xtfeng@gmail.com>
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Xiaotian Feng <xtfeng@gmail.com>
Reported-and-tested-by: Joachim Eastwood <manabian@gmail.com>
Reported-and-tested-by: Pavel Roskin <proski@gnu.org>
Reported-and-tested-by: Christian Casteyde <casteyde.christian@free.fr>
Reported-and-tested-by: Sebastian Siewior <sebastian@breakpoint.cc>
...
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox