* Re: [Bugme-new] [Bug 41152] New: kernel 3.0 and above fails to handle vlan id 0 (802.1p) packets properly without hardware acceleration
From: Jiri Pirko @ 2011-08-18 16:37 UTC (permalink / raw)
To: Mike Auty; +Cc: Andrew Morton, bugme-daemon, netdev
In-Reply-To: <4E4C4549.6020802@gmail.com>
Thu, Aug 18, 2011 at 12:48:41AM CEST, mike.auty@gmail.com wrote:
>On 17/08/11 11:59, Jiri Pirko wrote:
>>
>> I just obtained very similar card (8086:422b). Going to look at it right
>> away.
>>
>> One more thing. What do you use to generate vlan0 tagged packets? I'm
>> using pktgen with "vlan_id 0". Would you please try that it behaves the
>> same for you?
>>
>
>Sorry, I haven't been using pktgen. I've got an actual device (a
>Samsung android phone) which seems to tag all normal outbound packets
>with this type of vlan tag. I only discovered a month ago that I needed
>the 8021q module to be able to talk to it, and then suddenly it stopped
>working once I moved to the 3.0 kernel.
>
>I might not have made it clear, but the packets are received (in so much
>as the packet is definitely sent, and it's seen by tools such as
>wireshark), but no reply is ever sent. I've attached packet logs from
>the 3.0.1 kernel and the 2.6.39.3 kernel. Oddly the tagging only seems
>to be used on the first SYN,ACK packet, but again I don't know enough
>about the pipeline or what the Samsung kernel's doing to cause that.
>
>I hope that's of some help? I may be able to get systemtap support
>rolled into my kernel tomorrow at some point, but if not then it will
>have to wait until the weekend. I don't know if that will provide
>useful information for debugging this, but I am happy to run whatever
>tests I can to figure this out...
>
>Mike 5:)
Patch posted:
http://patchwork.ozlabs.org/patch/110535/
sorry I forgot to cc you Mike. Thanks a lot for report!
Jirka
^ permalink raw reply
* Re: [RFC] bridge: allow passing link-local multicast
From: Stephen Hemminger @ 2011-08-18 16:39 UTC (permalink / raw)
To: Nick Carter; +Cc: Ed Swierk, netdev, David Lamparter, bridge
In-Reply-To: <CAEJpZP2FGhPmTi0+eS+QRhj4y+aqfQHnEUrjmOOLHUay1SuAKg@mail.gmail.com>
On Thu, 18 Aug 2011 16:52:45 +0100
Nick Carter <ncarter100@gmail.com> wrote:
> On 18 August 2011 16:10, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > On Thu, 18 Aug 2011 16:06:19 +0100
> > Nick Carter <ncarter100@gmail.com> wrote:
> >
> >> Why can't we use the 802.1D specified STP group address to identify ?
> >> The existing code uses that address.
> >> I know you said on another thread that there are people using other addresses.
> >> Who are these people ?
> >> Are they following any standard ?
> >> What address / address range are they using ?
> >
> > The group address can be reprogrammed, and it is settable on other
> > routing equipment. People do it to create spanning tree domains.
> >
> But before the new
> + if (!is_stp_bpdu(skb) && br_forward_link_local)
> check, we have already checked
> if (unlikely(is_link_local(dest))) {
> So the frame must have a link local destination. If the reprogrammed
> group address is outside of the link local range then the new code in
> this patch will never be hit. If the reprogrammed group address is in
> the link local range then i'd suggest my previous group_fwd_mask patch
> is cleaner and more flexible.
The problem is that the group_fwd_mask is specific to the address
not the protocol.
^ permalink raw reply
* [patch v2] net: netdev-features.txt update to Documentation/networking/00-INDEX
From: Willem de Bruijn @ 2011-08-18 16:44 UTC (permalink / raw)
To: netdev; +Cc: davem
Update netdev-features.txt entry in 00-INDEX to incorporate
feedback by Michał Mirosław.
v2: restored tabs that were inadvertently changed to spaces in v1.
sorry for the error.
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
Documentation/networking/00-INDEX | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 811252b..bbce121 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -135,7 +135,7 @@ multiqueue.txt
netconsole.txt
- The network console module netconsole.ko: configuration and notes.
netdev-features.txt
- - Network interface "feature mess and how to get out from it alive".
+ - Network interface features API description.
netdevices.txt
- info on network device driver functions exported to the kernel.
netif-msg.txt
--
1.7.3.1
^ permalink raw reply related
* network protocol
From: Augusto Salazar @ 2011-08-18 17:07 UTC (permalink / raw)
To: netdev@vger.kernel.org
Greetings,
How do I create a protocol that caches all the traffic, the incomming before any other protocol, and outgoing before going to the lower layer?
Why do I need this? because I want to modify the packets,
For example:
add vlan tags to incomming packets so my vlan aware linux handles them and remove them on the way out so my non vlan aware pc can handle the packet.
I created a module using dev_add_pack, it works well as a sniffer but it does not prevent the other protocols from getting the packet.
As a test after the "sniffing" I called netif_rx insted of destroying the packet, I endeed up in a loop where I get one packet, sended to process and get back again ( at least that is how I understand it).
Any idea of how to do this?
I am hoping to achieve this without patching the kernel.
BR,
AUGUSTO SALAZAR
^ permalink raw reply
* Re: [patch net-2.6] vlan: reset headers on accel emulation path
From: Greg KH @ 2011-08-18 18:16 UTC (permalink / raw)
To: Jiri Pirko; +Cc: netdev, davem, kaber, shemminger, eric.dumazet
In-Reply-To: <1313685345-2417-1-git-send-email-jpirko@redhat.com>
On Thu, Aug 18, 2011 at 06:35:45PM +0200, Jiri Pirko wrote:
> It's after all necessary to do reset headers here. The reason is we
> cannot depend that it gets reseted in __netif_receive_skb once skb is
> reinjected. For incoming vlanids without vlan_dev, vlan_do_receive()
> returns false with skb != NULL and __netif_reveive_skb continues, skb is
> not reinjected.
>
> This might be good material for 3.0-stable as well
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read Documentation/stable_kernel_rules.txt
for how to do this properly.
</formletter>
^ permalink raw reply
* Re: [net-next 03/10] seeq: Move the SEEQ drivers
From: Ralf Baechle @ 2011-08-18 19:36 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, netdev, gospo, sassmann, Russell King, Hamish Coleman
In-Reply-To: <1313134384-7287-4-git-send-email-jeffrey.t.kirsher@intel.com>
On Fri, Aug 12, 2011 at 12:32:57AM -0700, Jeff Kirsher wrote:
> Move the drivers that use SEEQ chipset into drivers/net/ethernet/seeq
> and make the necessary Kconfig and Makefile changes.
>
> CC: Russell King <linux@arm.linux.org.uk>
> CC: Hamish Coleman <hamish@zot.apana.org.au>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> MAINTAINERS | 3 +-
> drivers/net/Kconfig | 18 -----------
> drivers/net/Makefile | 2 -
> drivers/net/arm/Kconfig | 7 ----
> drivers/net/arm/Makefile | 1 -
> drivers/net/ethernet/Kconfig | 1 +
> drivers/net/ethernet/Makefile | 1 +
> drivers/net/ethernet/seeq/Kconfig | 45 +++++++++++++++++++++++++++
> drivers/net/ethernet/seeq/Makefile | 7 ++++
> drivers/net/{arm => ethernet/seeq}/ether3.c | 0
> drivers/net/{arm => ethernet/seeq}/ether3.h | 0
> drivers/net/{ => ethernet/seeq}/seeq8005.c | 0
> drivers/net/{ => ethernet/seeq}/seeq8005.h | 0
> drivers/net/{ => ethernet/seeq}/sgiseeq.c | 0
> drivers/net/{ => ethernet/seeq}/sgiseeq.h | 0
> 15 files changed, 56 insertions(+), 29 deletions(-)
> create mode 100644 drivers/net/ethernet/seeq/Kconfig
> create mode 100644 drivers/net/ethernet/seeq/Makefile
> rename drivers/net/{arm => ethernet/seeq}/ether3.c (100%)
> rename drivers/net/{arm => ethernet/seeq}/ether3.h (100%)
> rename drivers/net/{ => ethernet/seeq}/seeq8005.c (100%)
> rename drivers/net/{ => ethernet/seeq}/seeq8005.h (100%)
> rename drivers/net/{ => ethernet/seeq}/sgiseeq.c (100%)
> rename drivers/net/{ => ethernet/seeq}/sgiseeq.h (100%)
This makes a lot more sense than shoving the Seeq drivers into
drivers/net/ethernet/sgi/ - even though sgiseeq depends on an SGI IP22/IP28
specific DMA engine.
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Ralf
^ permalink raw reply
* Re: [Bugme-new] [Bug 41152] New: kernel 3.0 and above fails to handle vlan id 0 (802.1p) packets properly without hardware acceleration
From: Mike Auty @ 2011-08-18 19:39 UTC (permalink / raw)
To: Jiri Pirko; +Cc: Andrew Morton, bugme-daemon, netdev
In-Reply-To: <20110818163702.GA1911@minipsycho>
On 18/08/11 17:37, Jiri Pirko wrote:
>
> Patch posted:
> http://patchwork.ozlabs.org/patch/110535/
>
> sorry I forgot to cc you Mike. Thanks a lot for report!
No problem,
Thanks very much for the speedy fix! I've applied the patch and can
confirm it solves my problem. I look forward to seeing it hit the
mainline... 5:)
Mike 5:)
^ permalink raw reply
* Re: [net-next 02/10] ioc3-eth/meth: Move the SGI drivers
From: Ralf Baechle @ 2011-08-18 19:46 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, netdev, gospo, sassmann
In-Reply-To: <1313134384-7287-3-git-send-email-jeffrey.t.kirsher@intel.com>
On Fri, Aug 12, 2011 at 12:32:56AM -0700, Jeff Kirsher wrote:
> diff --git a/drivers/net/ethernet/sgi/Kconfig b/drivers/net/ethernet/sgi/Kconfig
> new file mode 100644
> index 0000000..3098594
> --- /dev/null
> +++ b/drivers/net/ethernet/sgi/Kconfig
> @@ -0,0 +1,34 @@
> +#
> +# SGI device configuration
> +#
> +
> +config NET_VENDOR_SGI
> + bool "SGI devices"
> + depends on (PCI && SGI_IP27) || SGI_IP32
Can you make NET_VENDOR_SGI default to y for these systems? There is
normally no reason other than maybe testing to ever disable NET_VENDOR_SGI
as these NICs are all on the motherboard.
Otherwise ok. Thanks,
Ralf
^ permalink raw reply
* [PATCH] PM: add macro to test for runtime PM events
From: Alan Stern @ 2011-08-18 20:06 UTC (permalink / raw)
To: Greg KH, Rafael J. Wysocki
Cc: Linux-pm mailing list, USB list, netdev, linux-bluetooth,
linux-input, Takashi Iwai
This patch (as1482) adds a macro for testing whether or not a
pm_message value represents an autosuspend or autoresume (i.e., a
runtime PM) event. Encapsulating this notion seems preferable to
open-coding the test all over the place.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
---
This is a minor change in the PM API, but most of the affected files
are in the USB subsystem. Therefore either Rafael or Greg might prefer
to accept this patch.
Documentation/usb/power-management.txt | 8 ++++----
drivers/bluetooth/btusb.c | 2 +-
drivers/hid/hid-picolcd.c | 2 +-
drivers/hid/usbhid/hid-core.c | 7 +++----
drivers/net/usb/usbnet.c | 2 +-
drivers/net/wimax/i2400m/usb.c | 4 ++--
drivers/usb/class/cdc-acm.c | 2 +-
drivers/usb/class/cdc-wdm.c | 6 +++---
drivers/usb/core/driver.c | 9 ++++-----
drivers/usb/core/hcd.c | 4 ++--
drivers/usb/core/hub.c | 10 +++++-----
drivers/usb/serial/sierra.c | 2 +-
drivers/usb/serial/usb_wwan.c | 2 +-
include/linux/pm.h | 2 ++
sound/usb/card.c | 2 +-
15 files changed, 32 insertions(+), 32 deletions(-)
Index: usb-3.1/include/linux/pm.h
===================================================================
--- usb-3.1.orig/include/linux/pm.h
+++ usb-3.1/include/linux/pm.h
@@ -366,6 +366,8 @@ extern struct dev_pm_ops generic_subsys_
#define PMSG_AUTO_RESUME ((struct pm_message) \
{ .event = PM_EVENT_AUTO_RESUME, })
+#define PMSG_IS_AUTO(msg) (((msg).event & PM_EVENT_AUTO) != 0)
+
/**
* Device run-time power management status.
*
Index: usb-3.1/Documentation/usb/power-management.txt
===================================================================
--- usb-3.1.orig/Documentation/usb/power-management.txt
+++ usb-3.1/Documentation/usb/power-management.txt
@@ -439,10 +439,10 @@ cause autosuspends to fail with -EBUSY i
device.
External suspend calls should never be allowed to fail in this way,
-only autosuspend calls. The driver can tell them apart by checking
-the PM_EVENT_AUTO bit in the message.event argument to the suspend
-method; this bit will be set for internal PM events (autosuspend) and
-clear for external PM events.
+only autosuspend calls. The driver can tell them apart by applying
+the PMSG_IS_AUTO() macro to the message argument to the suspend
+method; it will return True for internal PM events (autosuspend) and
+False for external PM events.
Mutual exclusion
Index: usb-3.1/drivers/net/usb/usbnet.c
===================================================================
--- usb-3.1.orig/drivers/net/usb/usbnet.c
+++ usb-3.1/drivers/net/usb/usbnet.c
@@ -1470,7 +1470,7 @@ int usbnet_suspend (struct usb_interface
if (!dev->suspend_count++) {
spin_lock_irq(&dev->txq.lock);
/* don't autosuspend while transmitting */
- if (dev->txq.qlen && (message.event & PM_EVENT_AUTO)) {
+ if (dev->txq.qlen && PMSG_IS_AUTO(message)) {
spin_unlock_irq(&dev->txq.lock);
return -EBUSY;
} else {
Index: usb-3.1/drivers/net/wimax/i2400m/usb.c
===================================================================
--- usb-3.1.orig/drivers/net/wimax/i2400m/usb.c
+++ usb-3.1/drivers/net/wimax/i2400m/usb.c
@@ -599,7 +599,7 @@ void i2400mu_disconnect(struct usb_inter
*
* As well, the device might refuse going to sleep for whichever
* reason. In this case we just fail. For system suspend/hibernate,
- * we *can't* fail. We check PM_EVENT_AUTO to see if the
+ * we *can't* fail. We check PMSG_IS_AUTO to see if the
* suspend call comes from the USB stack or from the system and act
* in consequence.
*
@@ -615,7 +615,7 @@ int i2400mu_suspend(struct usb_interface
struct i2400m *i2400m = &i2400mu->i2400m;
#ifdef CONFIG_PM
- if (pm_msg.event & PM_EVENT_AUTO)
+ if (PMSG_IS_AUTO(pm_msg))
is_autosuspend = 1;
#endif
Index: usb-3.1/sound/usb/card.c
===================================================================
--- usb-3.1.orig/sound/usb/card.c
+++ usb-3.1/sound/usb/card.c
@@ -628,7 +628,7 @@ static int usb_audio_suspend(struct usb_
if (chip == (void *)-1L)
return 0;
- if (!(message.event & PM_EVENT_AUTO)) {
+ if (!PMSG_IS_AUTO(message)) {
snd_power_change_state(chip->card, SNDRV_CTL_POWER_D3hot);
if (!chip->num_suspended_intf++) {
list_for_each(p, &chip->pcm_list) {
Index: usb-3.1/drivers/bluetooth/btusb.c
===================================================================
--- usb-3.1.orig/drivers/bluetooth/btusb.c
+++ usb-3.1/drivers/bluetooth/btusb.c
@@ -1103,7 +1103,7 @@ static int btusb_suspend(struct usb_inte
return 0;
spin_lock_irq(&data->txlock);
- if (!((message.event & PM_EVENT_AUTO) && data->tx_in_flight)) {
+ if (!(PMSG_IS_AUTO(message) && data->tx_in_flight)) {
set_bit(BTUSB_SUSPENDING, &data->flags);
spin_unlock_irq(&data->txlock);
} else {
Index: usb-3.1/drivers/hid/hid-picolcd.c
===================================================================
--- usb-3.1.orig/drivers/hid/hid-picolcd.c
+++ usb-3.1/drivers/hid/hid-picolcd.c
@@ -2409,7 +2409,7 @@ static int picolcd_raw_event(struct hid_
#ifdef CONFIG_PM
static int picolcd_suspend(struct hid_device *hdev, pm_message_t message)
{
- if (message.event & PM_EVENT_AUTO)
+ if (PMSG_IS_AUTO(message))
return 0;
picolcd_suspend_backlight(hid_get_drvdata(hdev));
Index: usb-3.1/drivers/hid/usbhid/hid-core.c
===================================================================
--- usb-3.1.orig/drivers/hid/usbhid/hid-core.c
+++ usb-3.1/drivers/hid/usbhid/hid-core.c
@@ -1332,7 +1332,7 @@ static int hid_suspend(struct usb_interf
struct usbhid_device *usbhid = hid->driver_data;
int status;
- if (message.event & PM_EVENT_AUTO) {
+ if (PMSG_IS_AUTO(message)) {
spin_lock_irq(&usbhid->lock); /* Sync with error handler */
if (!test_bit(HID_RESET_PENDING, &usbhid->iofl)
&& !test_bit(HID_CLEAR_HALT, &usbhid->iofl)
@@ -1367,7 +1367,7 @@ static int hid_suspend(struct usb_interf
return -EIO;
}
- if (!ignoreled && (message.event & PM_EVENT_AUTO)) {
+ if (!ignoreled && PMSG_IS_AUTO(message)) {
spin_lock_irq(&usbhid->lock);
if (test_bit(HID_LED_ON, &usbhid->iofl)) {
spin_unlock_irq(&usbhid->lock);
@@ -1380,8 +1380,7 @@ static int hid_suspend(struct usb_interf
hid_cancel_delayed_stuff(usbhid);
hid_cease_io(usbhid);
- if ((message.event & PM_EVENT_AUTO) &&
- test_bit(HID_KEYS_PRESSED, &usbhid->iofl)) {
+ if (PMSG_IS_AUTO(message) && test_bit(HID_KEYS_PRESSED, &usbhid->iofl)) {
/* lost race against keypresses */
status = hid_start_in(hid);
if (status < 0)
Index: usb-3.1/drivers/usb/class/cdc-acm.c
===================================================================
--- usb-3.1.orig/drivers/usb/class/cdc-acm.c
+++ usb-3.1/drivers/usb/class/cdc-acm.c
@@ -1305,7 +1305,7 @@ static int acm_suspend(struct usb_interf
struct acm *acm = usb_get_intfdata(intf);
int cnt;
- if (message.event & PM_EVENT_AUTO) {
+ if (PMSG_IS_AUTO(message)) {
int b;
spin_lock_irq(&acm->write_lock);
Index: usb-3.1/drivers/usb/class/cdc-wdm.c
===================================================================
--- usb-3.1.orig/drivers/usb/class/cdc-wdm.c
+++ usb-3.1/drivers/usb/class/cdc-wdm.c
@@ -798,11 +798,11 @@ static int wdm_suspend(struct usb_interf
dev_dbg(&desc->intf->dev, "wdm%d_suspend\n", intf->minor);
/* if this is an autosuspend the caller does the locking */
- if (!(message.event & PM_EVENT_AUTO))
+ if (!PMSG_IS_AUTO(message))
mutex_lock(&desc->lock);
spin_lock_irq(&desc->iuspin);
- if ((message.event & PM_EVENT_AUTO) &&
+ if (PMSG_IS_AUTO(message) &&
(test_bit(WDM_IN_USE, &desc->flags)
|| test_bit(WDM_RESPONDING, &desc->flags))) {
spin_unlock_irq(&desc->iuspin);
@@ -815,7 +815,7 @@ static int wdm_suspend(struct usb_interf
kill_urbs(desc);
cancel_work_sync(&desc->rxwork);
}
- if (!(message.event & PM_EVENT_AUTO))
+ if (!PMSG_IS_AUTO(message))
mutex_unlock(&desc->lock);
return rv;
Index: usb-3.1/drivers/usb/core/driver.c
===================================================================
--- usb-3.1.orig/drivers/usb/core/driver.c
+++ usb-3.1/drivers/usb/core/driver.c
@@ -1046,8 +1046,7 @@ static int usb_resume_device(struct usb_
/* Non-root devices on a full/low-speed bus must wait for their
* companion high-speed root hub, in case a handoff is needed.
*/
- if (!(msg.event & PM_EVENT_AUTO) && udev->parent &&
- udev->bus->hs_companion)
+ if (!PMSG_IS_AUTO(msg) && udev->parent && udev->bus->hs_companion)
device_pm_wait_for_dev(&udev->dev,
&udev->bus->hs_companion->root_hub->dev);
@@ -1075,7 +1074,7 @@ static int usb_suspend_interface(struct
if (driver->suspend) {
status = driver->suspend(intf, msg);
- if (status && !(msg.event & PM_EVENT_AUTO))
+ if (status && !PMSG_IS_AUTO(msg))
dev_err(&intf->dev, "%s error %d\n",
"suspend", status);
} else {
@@ -1189,7 +1188,7 @@ static int usb_suspend_both(struct usb_d
status = usb_suspend_interface(udev, intf, msg);
/* Ignore errors during system sleep transitions */
- if (!(msg.event & PM_EVENT_AUTO))
+ if (!PMSG_IS_AUTO(msg))
status = 0;
if (status != 0)
break;
@@ -1199,7 +1198,7 @@ static int usb_suspend_both(struct usb_d
status = usb_suspend_device(udev, msg);
/* Again, ignore errors during system sleep transitions */
- if (!(msg.event & PM_EVENT_AUTO))
+ if (!PMSG_IS_AUTO(msg))
status = 0;
}
Index: usb-3.1/drivers/usb/core/hcd.c
===================================================================
--- usb-3.1.orig/drivers/usb/core/hcd.c
+++ usb-3.1/drivers/usb/core/hcd.c
@@ -1960,7 +1960,7 @@ int hcd_bus_suspend(struct usb_device *r
int old_state = hcd->state;
dev_dbg(&rhdev->dev, "bus %s%s\n",
- (msg.event & PM_EVENT_AUTO ? "auto-" : ""), "suspend");
+ (PMSG_IS_AUTO(msg) ? "auto-" : ""), "suspend");
if (HCD_DEAD(hcd)) {
dev_dbg(&rhdev->dev, "skipped %s of dead bus\n", "suspend");
return 0;
@@ -1996,7 +1996,7 @@ int hcd_bus_resume(struct usb_device *rh
int old_state = hcd->state;
dev_dbg(&rhdev->dev, "usb %s%s\n",
- (msg.event & PM_EVENT_AUTO ? "auto-" : ""), "resume");
+ (PMSG_IS_AUTO(msg) ? "auto-" : ""), "resume");
if (HCD_DEAD(hcd)) {
dev_dbg(&rhdev->dev, "skipped %s of dead bus\n", "resume");
return 0;
Index: usb-3.1/drivers/usb/core/hub.c
===================================================================
--- usb-3.1.orig/drivers/usb/core/hub.c
+++ usb-3.1/drivers/usb/core/hub.c
@@ -2342,7 +2342,7 @@ int usb_port_suspend(struct usb_device *
dev_dbg(&udev->dev, "won't remote wakeup, status %d\n",
status);
/* bail if autosuspend is requested */
- if (msg.event & PM_EVENT_AUTO)
+ if (PMSG_IS_AUTO(msg))
return status;
}
}
@@ -2367,12 +2367,12 @@ int usb_port_suspend(struct usb_device *
USB_CTRL_SET_TIMEOUT);
/* System sleep transitions should never fail */
- if (!(msg.event & PM_EVENT_AUTO))
+ if (!PMSG_IS_AUTO(msg))
status = 0;
} else {
/* device has up to 10 msec to fully suspend */
dev_dbg(&udev->dev, "usb %ssuspend\n",
- (msg.event & PM_EVENT_AUTO ? "auto-" : ""));
+ (PMSG_IS_AUTO(msg) ? "auto-" : ""));
usb_set_device_state(udev, USB_STATE_SUSPENDED);
msleep(10);
}
@@ -2523,7 +2523,7 @@ int usb_port_resume(struct usb_device *u
} else {
/* drive resume for at least 20 msec */
dev_dbg(&udev->dev, "usb %sresume\n",
- (msg.event & PM_EVENT_AUTO ? "auto-" : ""));
+ (PMSG_IS_AUTO(msg) ? "auto-" : ""));
msleep(25);
/* Virtual root hubs can trigger on GET_PORT_STATUS to
@@ -2625,7 +2625,7 @@ static int hub_suspend(struct usb_interf
udev = hdev->children [port1-1];
if (udev && udev->can_submit) {
dev_warn(&intf->dev, "port %d nyet suspended\n", port1);
- if (msg.event & PM_EVENT_AUTO)
+ if (PMSG_IS_AUTO(msg))
return -EBUSY;
}
}
Index: usb-3.1/drivers/usb/serial/sierra.c
===================================================================
--- usb-3.1.orig/drivers/usb/serial/sierra.c
+++ usb-3.1/drivers/usb/serial/sierra.c
@@ -1009,7 +1009,7 @@ static int sierra_suspend(struct usb_ser
struct sierra_intf_private *intfdata;
int b;
- if (message.event & PM_EVENT_AUTO) {
+ if (PMSG_IS_AUTO(message)) {
intfdata = serial->private;
spin_lock_irq(&intfdata->susp_lock);
b = intfdata->in_flight;
Index: usb-3.1/drivers/usb/serial/usb_wwan.c
===================================================================
--- usb-3.1.orig/drivers/usb/serial/usb_wwan.c
+++ usb-3.1/drivers/usb/serial/usb_wwan.c
@@ -651,7 +651,7 @@ int usb_wwan_suspend(struct usb_serial *
dbg("%s entered", __func__);
- if (message.event & PM_EVENT_AUTO) {
+ if (PMSG_IS_AUTO(message)) {
spin_lock_irq(&intfdata->susp_lock);
b = intfdata->in_flight;
spin_unlock_irq(&intfdata->susp_lock);
^ permalink raw reply
* [PATCH net-next] MAINTAINERS: qlcnic
From: Anirban Chakraborty @ 2011-08-18 20:03 UTC (permalink / raw)
To: davem
Cc: netdev, Dept_NX_Linux_NIC_Driver, Amit Kumar Salecha,
Anirban Chakraborty
Please apply the change. Thanks.
-Anirban
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
MAINTAINERS | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index d374c6f..92e051d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5248,8 +5248,8 @@ F: Documentation/networking/LICENSE.qla3xxx
F: drivers/net/ethernet/qlogic/qla3xxx.*
QLOGIC QLCNIC (1/10)Gb ETHERNET DRIVER
-M: Amit Kumar Salecha <amit.salecha@qlogic.com>
M: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
+M: Sony Chacko <sony.chacko@qlogic.com>
M: linux-driver@qlogic.com
L: netdev@vger.kernel.org
S: Supported
--
1.7.4.1
^ permalink raw reply related
* Re: [PATCH] PM: add macro to test for runtime PM events
From: Rafael J. Wysocki @ 2011-08-18 20:52 UTC (permalink / raw)
To: Alan Stern
Cc: Greg KH, Linux-pm mailing list, USB list, netdev, linux-bluetooth,
linux-input, Takashi Iwai
In-Reply-To: <Pine.LNX.4.44L0.1108181600020.1628-100000@iolanthe.rowland.org>
Hi,
On Thursday, August 18, 2011, Alan Stern wrote:
> This patch (as1482) adds a macro for testing whether or not a
> pm_message value represents an autosuspend or autoresume (i.e., a
> runtime PM) event. Encapsulating this notion seems preferable to
> open-coding the test all over the place.
>
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
>
> ---
>
> This is a minor change in the PM API, but most of the affected files
> are in the USB subsystem. Therefore either Rafael or Greg might prefer
> to accept this patch.
I can take the patch if Greg is fine with that.
Thanks,
Rafael
> Documentation/usb/power-management.txt | 8 ++++----
> drivers/bluetooth/btusb.c | 2 +-
> drivers/hid/hid-picolcd.c | 2 +-
> drivers/hid/usbhid/hid-core.c | 7 +++----
> drivers/net/usb/usbnet.c | 2 +-
> drivers/net/wimax/i2400m/usb.c | 4 ++--
> drivers/usb/class/cdc-acm.c | 2 +-
> drivers/usb/class/cdc-wdm.c | 6 +++---
> drivers/usb/core/driver.c | 9 ++++-----
> drivers/usb/core/hcd.c | 4 ++--
> drivers/usb/core/hub.c | 10 +++++-----
> drivers/usb/serial/sierra.c | 2 +-
> drivers/usb/serial/usb_wwan.c | 2 +-
> include/linux/pm.h | 2 ++
> sound/usb/card.c | 2 +-
> 15 files changed, 32 insertions(+), 32 deletions(-)
>
> Index: usb-3.1/include/linux/pm.h
> ===================================================================
> --- usb-3.1.orig/include/linux/pm.h
> +++ usb-3.1/include/linux/pm.h
> @@ -366,6 +366,8 @@ extern struct dev_pm_ops generic_subsys_
> #define PMSG_AUTO_RESUME ((struct pm_message) \
> { .event = PM_EVENT_AUTO_RESUME, })
>
> +#define PMSG_IS_AUTO(msg) (((msg).event & PM_EVENT_AUTO) != 0)
> +
> /**
> * Device run-time power management status.
> *
> Index: usb-3.1/Documentation/usb/power-management.txt
> ===================================================================
> --- usb-3.1.orig/Documentation/usb/power-management.txt
> +++ usb-3.1/Documentation/usb/power-management.txt
> @@ -439,10 +439,10 @@ cause autosuspends to fail with -EBUSY i
> device.
>
> External suspend calls should never be allowed to fail in this way,
> -only autosuspend calls. The driver can tell them apart by checking
> -the PM_EVENT_AUTO bit in the message.event argument to the suspend
> -method; this bit will be set for internal PM events (autosuspend) and
> -clear for external PM events.
> +only autosuspend calls. The driver can tell them apart by applying
> +the PMSG_IS_AUTO() macro to the message argument to the suspend
> +method; it will return True for internal PM events (autosuspend) and
> +False for external PM events.
>
>
> Mutual exclusion
> Index: usb-3.1/drivers/net/usb/usbnet.c
> ===================================================================
> --- usb-3.1.orig/drivers/net/usb/usbnet.c
> +++ usb-3.1/drivers/net/usb/usbnet.c
> @@ -1470,7 +1470,7 @@ int usbnet_suspend (struct usb_interface
> if (!dev->suspend_count++) {
> spin_lock_irq(&dev->txq.lock);
> /* don't autosuspend while transmitting */
> - if (dev->txq.qlen && (message.event & PM_EVENT_AUTO)) {
> + if (dev->txq.qlen && PMSG_IS_AUTO(message)) {
> spin_unlock_irq(&dev->txq.lock);
> return -EBUSY;
> } else {
> Index: usb-3.1/drivers/net/wimax/i2400m/usb.c
> ===================================================================
> --- usb-3.1.orig/drivers/net/wimax/i2400m/usb.c
> +++ usb-3.1/drivers/net/wimax/i2400m/usb.c
> @@ -599,7 +599,7 @@ void i2400mu_disconnect(struct usb_inter
> *
> * As well, the device might refuse going to sleep for whichever
> * reason. In this case we just fail. For system suspend/hibernate,
> - * we *can't* fail. We check PM_EVENT_AUTO to see if the
> + * we *can't* fail. We check PMSG_IS_AUTO to see if the
> * suspend call comes from the USB stack or from the system and act
> * in consequence.
> *
> @@ -615,7 +615,7 @@ int i2400mu_suspend(struct usb_interface
> struct i2400m *i2400m = &i2400mu->i2400m;
>
> #ifdef CONFIG_PM
> - if (pm_msg.event & PM_EVENT_AUTO)
> + if (PMSG_IS_AUTO(pm_msg))
> is_autosuspend = 1;
> #endif
>
> Index: usb-3.1/sound/usb/card.c
> ===================================================================
> --- usb-3.1.orig/sound/usb/card.c
> +++ usb-3.1/sound/usb/card.c
> @@ -628,7 +628,7 @@ static int usb_audio_suspend(struct usb_
> if (chip == (void *)-1L)
> return 0;
>
> - if (!(message.event & PM_EVENT_AUTO)) {
> + if (!PMSG_IS_AUTO(message)) {
> snd_power_change_state(chip->card, SNDRV_CTL_POWER_D3hot);
> if (!chip->num_suspended_intf++) {
> list_for_each(p, &chip->pcm_list) {
> Index: usb-3.1/drivers/bluetooth/btusb.c
> ===================================================================
> --- usb-3.1.orig/drivers/bluetooth/btusb.c
> +++ usb-3.1/drivers/bluetooth/btusb.c
> @@ -1103,7 +1103,7 @@ static int btusb_suspend(struct usb_inte
> return 0;
>
> spin_lock_irq(&data->txlock);
> - if (!((message.event & PM_EVENT_AUTO) && data->tx_in_flight)) {
> + if (!(PMSG_IS_AUTO(message) && data->tx_in_flight)) {
> set_bit(BTUSB_SUSPENDING, &data->flags);
> spin_unlock_irq(&data->txlock);
> } else {
> Index: usb-3.1/drivers/hid/hid-picolcd.c
> ===================================================================
> --- usb-3.1.orig/drivers/hid/hid-picolcd.c
> +++ usb-3.1/drivers/hid/hid-picolcd.c
> @@ -2409,7 +2409,7 @@ static int picolcd_raw_event(struct hid_
> #ifdef CONFIG_PM
> static int picolcd_suspend(struct hid_device *hdev, pm_message_t message)
> {
> - if (message.event & PM_EVENT_AUTO)
> + if (PMSG_IS_AUTO(message))
> return 0;
>
> picolcd_suspend_backlight(hid_get_drvdata(hdev));
> Index: usb-3.1/drivers/hid/usbhid/hid-core.c
> ===================================================================
> --- usb-3.1.orig/drivers/hid/usbhid/hid-core.c
> +++ usb-3.1/drivers/hid/usbhid/hid-core.c
> @@ -1332,7 +1332,7 @@ static int hid_suspend(struct usb_interf
> struct usbhid_device *usbhid = hid->driver_data;
> int status;
>
> - if (message.event & PM_EVENT_AUTO) {
> + if (PMSG_IS_AUTO(message)) {
> spin_lock_irq(&usbhid->lock); /* Sync with error handler */
> if (!test_bit(HID_RESET_PENDING, &usbhid->iofl)
> && !test_bit(HID_CLEAR_HALT, &usbhid->iofl)
> @@ -1367,7 +1367,7 @@ static int hid_suspend(struct usb_interf
> return -EIO;
> }
>
> - if (!ignoreled && (message.event & PM_EVENT_AUTO)) {
> + if (!ignoreled && PMSG_IS_AUTO(message)) {
> spin_lock_irq(&usbhid->lock);
> if (test_bit(HID_LED_ON, &usbhid->iofl)) {
> spin_unlock_irq(&usbhid->lock);
> @@ -1380,8 +1380,7 @@ static int hid_suspend(struct usb_interf
> hid_cancel_delayed_stuff(usbhid);
> hid_cease_io(usbhid);
>
> - if ((message.event & PM_EVENT_AUTO) &&
> - test_bit(HID_KEYS_PRESSED, &usbhid->iofl)) {
> + if (PMSG_IS_AUTO(message) && test_bit(HID_KEYS_PRESSED, &usbhid->iofl)) {
> /* lost race against keypresses */
> status = hid_start_in(hid);
> if (status < 0)
> Index: usb-3.1/drivers/usb/class/cdc-acm.c
> ===================================================================
> --- usb-3.1.orig/drivers/usb/class/cdc-acm.c
> +++ usb-3.1/drivers/usb/class/cdc-acm.c
> @@ -1305,7 +1305,7 @@ static int acm_suspend(struct usb_interf
> struct acm *acm = usb_get_intfdata(intf);
> int cnt;
>
> - if (message.event & PM_EVENT_AUTO) {
> + if (PMSG_IS_AUTO(message)) {
> int b;
>
> spin_lock_irq(&acm->write_lock);
> Index: usb-3.1/drivers/usb/class/cdc-wdm.c
> ===================================================================
> --- usb-3.1.orig/drivers/usb/class/cdc-wdm.c
> +++ usb-3.1/drivers/usb/class/cdc-wdm.c
> @@ -798,11 +798,11 @@ static int wdm_suspend(struct usb_interf
> dev_dbg(&desc->intf->dev, "wdm%d_suspend\n", intf->minor);
>
> /* if this is an autosuspend the caller does the locking */
> - if (!(message.event & PM_EVENT_AUTO))
> + if (!PMSG_IS_AUTO(message))
> mutex_lock(&desc->lock);
> spin_lock_irq(&desc->iuspin);
>
> - if ((message.event & PM_EVENT_AUTO) &&
> + if (PMSG_IS_AUTO(message) &&
> (test_bit(WDM_IN_USE, &desc->flags)
> || test_bit(WDM_RESPONDING, &desc->flags))) {
> spin_unlock_irq(&desc->iuspin);
> @@ -815,7 +815,7 @@ static int wdm_suspend(struct usb_interf
> kill_urbs(desc);
> cancel_work_sync(&desc->rxwork);
> }
> - if (!(message.event & PM_EVENT_AUTO))
> + if (!PMSG_IS_AUTO(message))
> mutex_unlock(&desc->lock);
>
> return rv;
> Index: usb-3.1/drivers/usb/core/driver.c
> ===================================================================
> --- usb-3.1.orig/drivers/usb/core/driver.c
> +++ usb-3.1/drivers/usb/core/driver.c
> @@ -1046,8 +1046,7 @@ static int usb_resume_device(struct usb_
> /* Non-root devices on a full/low-speed bus must wait for their
> * companion high-speed root hub, in case a handoff is needed.
> */
> - if (!(msg.event & PM_EVENT_AUTO) && udev->parent &&
> - udev->bus->hs_companion)
> + if (!PMSG_IS_AUTO(msg) && udev->parent && udev->bus->hs_companion)
> device_pm_wait_for_dev(&udev->dev,
> &udev->bus->hs_companion->root_hub->dev);
>
> @@ -1075,7 +1074,7 @@ static int usb_suspend_interface(struct
>
> if (driver->suspend) {
> status = driver->suspend(intf, msg);
> - if (status && !(msg.event & PM_EVENT_AUTO))
> + if (status && !PMSG_IS_AUTO(msg))
> dev_err(&intf->dev, "%s error %d\n",
> "suspend", status);
> } else {
> @@ -1189,7 +1188,7 @@ static int usb_suspend_both(struct usb_d
> status = usb_suspend_interface(udev, intf, msg);
>
> /* Ignore errors during system sleep transitions */
> - if (!(msg.event & PM_EVENT_AUTO))
> + if (!PMSG_IS_AUTO(msg))
> status = 0;
> if (status != 0)
> break;
> @@ -1199,7 +1198,7 @@ static int usb_suspend_both(struct usb_d
> status = usb_suspend_device(udev, msg);
>
> /* Again, ignore errors during system sleep transitions */
> - if (!(msg.event & PM_EVENT_AUTO))
> + if (!PMSG_IS_AUTO(msg))
> status = 0;
> }
>
> Index: usb-3.1/drivers/usb/core/hcd.c
> ===================================================================
> --- usb-3.1.orig/drivers/usb/core/hcd.c
> +++ usb-3.1/drivers/usb/core/hcd.c
> @@ -1960,7 +1960,7 @@ int hcd_bus_suspend(struct usb_device *r
> int old_state = hcd->state;
>
> dev_dbg(&rhdev->dev, "bus %s%s\n",
> - (msg.event & PM_EVENT_AUTO ? "auto-" : ""), "suspend");
> + (PMSG_IS_AUTO(msg) ? "auto-" : ""), "suspend");
> if (HCD_DEAD(hcd)) {
> dev_dbg(&rhdev->dev, "skipped %s of dead bus\n", "suspend");
> return 0;
> @@ -1996,7 +1996,7 @@ int hcd_bus_resume(struct usb_device *rh
> int old_state = hcd->state;
>
> dev_dbg(&rhdev->dev, "usb %s%s\n",
> - (msg.event & PM_EVENT_AUTO ? "auto-" : ""), "resume");
> + (PMSG_IS_AUTO(msg) ? "auto-" : ""), "resume");
> if (HCD_DEAD(hcd)) {
> dev_dbg(&rhdev->dev, "skipped %s of dead bus\n", "resume");
> return 0;
> Index: usb-3.1/drivers/usb/core/hub.c
> ===================================================================
> --- usb-3.1.orig/drivers/usb/core/hub.c
> +++ usb-3.1/drivers/usb/core/hub.c
> @@ -2342,7 +2342,7 @@ int usb_port_suspend(struct usb_device *
> dev_dbg(&udev->dev, "won't remote wakeup, status %d\n",
> status);
> /* bail if autosuspend is requested */
> - if (msg.event & PM_EVENT_AUTO)
> + if (PMSG_IS_AUTO(msg))
> return status;
> }
> }
> @@ -2367,12 +2367,12 @@ int usb_port_suspend(struct usb_device *
> USB_CTRL_SET_TIMEOUT);
>
> /* System sleep transitions should never fail */
> - if (!(msg.event & PM_EVENT_AUTO))
> + if (!PMSG_IS_AUTO(msg))
> status = 0;
> } else {
> /* device has up to 10 msec to fully suspend */
> dev_dbg(&udev->dev, "usb %ssuspend\n",
> - (msg.event & PM_EVENT_AUTO ? "auto-" : ""));
> + (PMSG_IS_AUTO(msg) ? "auto-" : ""));
> usb_set_device_state(udev, USB_STATE_SUSPENDED);
> msleep(10);
> }
> @@ -2523,7 +2523,7 @@ int usb_port_resume(struct usb_device *u
> } else {
> /* drive resume for at least 20 msec */
> dev_dbg(&udev->dev, "usb %sresume\n",
> - (msg.event & PM_EVENT_AUTO ? "auto-" : ""));
> + (PMSG_IS_AUTO(msg) ? "auto-" : ""));
> msleep(25);
>
> /* Virtual root hubs can trigger on GET_PORT_STATUS to
> @@ -2625,7 +2625,7 @@ static int hub_suspend(struct usb_interf
> udev = hdev->children [port1-1];
> if (udev && udev->can_submit) {
> dev_warn(&intf->dev, "port %d nyet suspended\n", port1);
> - if (msg.event & PM_EVENT_AUTO)
> + if (PMSG_IS_AUTO(msg))
> return -EBUSY;
> }
> }
> Index: usb-3.1/drivers/usb/serial/sierra.c
> ===================================================================
> --- usb-3.1.orig/drivers/usb/serial/sierra.c
> +++ usb-3.1/drivers/usb/serial/sierra.c
> @@ -1009,7 +1009,7 @@ static int sierra_suspend(struct usb_ser
> struct sierra_intf_private *intfdata;
> int b;
>
> - if (message.event & PM_EVENT_AUTO) {
> + if (PMSG_IS_AUTO(message)) {
> intfdata = serial->private;
> spin_lock_irq(&intfdata->susp_lock);
> b = intfdata->in_flight;
> Index: usb-3.1/drivers/usb/serial/usb_wwan.c
> ===================================================================
> --- usb-3.1.orig/drivers/usb/serial/usb_wwan.c
> +++ usb-3.1/drivers/usb/serial/usb_wwan.c
> @@ -651,7 +651,7 @@ int usb_wwan_suspend(struct usb_serial *
>
> dbg("%s entered", __func__);
>
> - if (message.event & PM_EVENT_AUTO) {
> + if (PMSG_IS_AUTO(message)) {
> spin_lock_irq(&intfdata->susp_lock);
> b = intfdata->in_flight;
> spin_unlock_irq(&intfdata->susp_lock);
>
>
>
^ permalink raw reply
* Urgent offer
From: William Leung @ 2011-08-18 20:25 UTC (permalink / raw)
I am William Leung Wing Cheung, I have a confidential business worth
24.5 million
US Dollars for you to handle for me. Kindly reply via
<williamleugwc1@yahoo.com.hk> for details if interested.
^ permalink raw reply
* winner
From: Microsoft @ 2011-08-18 20:31 UTC (permalink / raw)
You have won 500.000 GBP
send your phone number
and address
^ permalink raw reply
* (unknown)
From: San Mehat @ 2011-08-18 22:07 UTC (permalink / raw)
To: davem, mst, rusty
Cc: linux-kernel, virtualization, netdev, digitaleric, mikew, miche,
maccarro
TL;DR
-----
In this RFC we propose the introduction of the concept of hardware socket
offload to the Linux kernel. Patches will accompany this RFC in a few days,
but we felt we had enough on the design to solicit constructive discussion
from the community at-large.
BACKGROUND
----------
Many applications within enterprise organizations suitable for virtualization
neither require nor desire a connection to the full internal Ethernet+IP
network. Rather, some specific socket connections -- for processing HTTP
requests, making database queries, or interacting with storage -- are needed,
and IP networking in the application may typically be discouraged for
applications that do not sit on the edge of the network. Furthermore, removing
the application's need to understand where its inputs come from / go to within
the networking fabric can make save/restore/migration of a virtualized
application substantially easier - especially in large clusters and on fabrics
which can't handle IP re-assignment.
REQUIREMENTS
------------
* Allow VM connectivity to internal resources without requiring additional
network resources (IPs, VLANs, etc).
* Easy authentication of network streams from a trusted domain (vmm).
* Protect host-kernel & network-fabric from direct exposure to untrusted
packet data-structures.
* Support for multiple distributions of Linux.
* Minimal third-party software maintenance burden.
* To be able to co-exist with the existing network stack and ethernet virtual
devices in the event that an applications specific requirements cannot be
met by this design.
DESIGN
------
The Berkeley sockets coprocessor is a virtual PCI device which has the ability
to offload socket activity from an unmodified application at the BSD sockets
layer (Layer 4). Offloaded socket requests bypass the local operating systems
networking stack entirely via the card and are relayed into the VMM
(Virtual Machine Manager) for processing. The VMM then passes the request to a
socket backend for handling. The difference between a socket backend and a
traditional VM ethernet backend is that the socket backend receives layer 4
socket (STREAM/DGRAM) requests instead of a multiplexed stream of layer 2
packets (ethernet) that must be interpreted by the host. This technique also
improves security isolation as the guest is no longer constructing packets which
are evaluated by the host or underlying network fabric; packet construction
happens in the host.
Lastly, pushing socket processing back into the host allows for host-side
control of the network protocols used, which limits the potential congestion
problems that can arise when various guests are using their own congestion
control algorithms.
================================================================================
+-----------------------------------------------------------------+
| |
guest | unmodified application |
userspace +-----------------------------------------------------------------+
| unmodified libc |
+-----------------------------------------------------------------+
| / \
| |
=========================== | ============================ | ===================
| |
\ / |
+------------------------------------------------------+
| socket core |
+----+============+------------------------------------+
| INET | | / \
guest +-----+------+ | |
kernel | TCP | UDP | | |
+-----+------+ | L4 reqs |
| NETDEV | | |
+------------+ | |
| virtio_net | \ / |
+------------+ +------------------+
| / \ | hw_socket |
| | +------------------+
| | | virtio_socket |
| | +------------------+
| | | / \
========================= | == | ====================== | ====== | =============
\ / | \ / |
host +---------------------+ +------------------------+
userspace | virito net device | | virtio socket device |
(vmm) +---------------------+ +------------------------+
| ethernet backend | | socket backend |
+---------------------+ +------------------------+
| / \ | / \
L2 | | | | L4
packets | | \ / | requests
| | +-----------------------+
| | | Socket Handlers |
| | +-----------------------+
| | | / \
======================= | ==== | ===================== | ======= | =============
| | | |
host \ / | \ / |
kernel
================================================================================
One of the most appealing aspects of this design (to application developers) is
that this approach can be completely transparent to the application, provided
we're able to intercept the application's socket requests in such a way that we
do not impact performance in a negative fashion, yet retain the API semantics
the application expects. In the event that this design is not suitable for an
application, the virtual machine may be also fitted with a normal virtual
ethernet device in addition to the co-processor (as shown in the diagram above).
Since we wish to allow these paravirtualized sockets to coexist peacefully with
the existing Linux socket system, we've chosen to introduce the idea that a
socket can at some point transition from being managed by the O/S socket system
to a more enlightened 'hardware assisted' socket. The transition is managed by
a 'socket coprocessor' component which intercepts and gets first right of
refusal on handling certain global socket calls (connect, sendto, bind, etc...).
In this initial design, the policy on whether to transition a socket or not is
made by the virtual hardware, although we understand that further measurement
into operation latency is warranted.
In the event the determination is made to transition a socket to hw-assisted
mode, the socket is marked as being assisted by hardware, and all socket
operations are offloaded to hardware.
The following flag values have been added to struct socket (only visible within
the guest kernel):
* SOCK_HWASSIST
Indicates socket operations are handled by hardware
In order to support a variety of socket address families, addresses are
converted from their native socket family to an opaque string. Our initial
design formats these strings as URIs. The currently supported conversions are:
+-----------------------------------------------------------------------------+
| Domain | Type | URI example conversion |
| AF_INET | SOCK_STREAM | tcp://x.x.x.x:yyyy |
| AF_INET | SOCK_DGRAM | udp://x.x.x.x:yyyy |
| AF_INET6 | SOCK_STREAM | tcp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii |
| AF_INET6 | SOCK_DGRAM | udp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii |
| AF_IPX | SOCK_DGRAM | ipx://xxxxxxxx.yyyyyyyyyy.zzzz |
+-----------------------------------------------------------------------------+
In order for the socket coprocessor to take control of a socket, hooks must be
added to the socket core. Our initial implementation hooks a number of functions
in the socket-core (too many), and after consideration we feel we can reduce it
down considerably by managing the socket 'ops' pointers.
ALTERNATIVE STRATEGIES
----------------------
An alternative strategy for providing similar functionality involves either
modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We were
forced to rule this out due to the complexity (and fragility) involved with
attempting to maintain a general solution compatible accross various
distributions where platform-libraries differ.
CAVEATS
-------
* We're currently hooked into too many socket calls. We should be able to
reduce the number of hooks to 3 (__sock_create(), sys_connect(), sys_bind()).
* Our 'hw_socket' component should be folded into a netdev so we can leverage
NAPI.
* We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.
* We don't currently have support for /proc/net. Our current plan is to
add '/proc/net/hwsock' (filename TBD) and add support for these sockets
to the net-tools packages (netstat & friends), rather than muck around with
plumbing hardware-assisted socket info into '/proc/net/tcp' and
'/proc/net/udp'.
* We don't currently have SOCK_DGRAM support implemented (work in progress)
* We have insufficient integration testing in place (work in progress)
^ permalink raw reply
* [RFC 0/0] Introducing a generic socket offload framework
From: San Mehat @ 2011-08-18 22:07 UTC (permalink / raw)
To: davem, mst, rusty
Cc: linux-kernel, virtualization, netdev, digitaleric, mikew, miche,
maccarro
TL;DR
-----
In this RFC we propose the introduction of the concept of hardware socket
offload to the Linux kernel. Patches will accompany this RFC in a few days,
but we felt we had enough on the design to solicit constructive discussion
from the community at-large.
BACKGROUND
----------
Many applications within enterprise organizations suitable for virtualization
neither require nor desire a connection to the full internal Ethernet+IP
network. Rather, some specific socket connections -- for processing HTTP
requests, making database queries, or interacting with storage -- are needed,
and IP networking in the application may typically be discouraged for
applications that do not sit on the edge of the network. Furthermore, removing
the application's need to understand where its inputs come from / go to within
the networking fabric can make save/restore/migration of a virtualized
application substantially easier - especially in large clusters and on fabrics
which can't handle IP re-assignment.
REQUIREMENTS
------------
* Allow VM connectivity to internal resources without requiring additional
network resources (IPs, VLANs, etc).
* Easy authentication of network streams from a trusted domain (vmm).
* Protect host-kernel & network-fabric from direct exposure to untrusted
packet data-structures.
* Support for multiple distributions of Linux.
* Minimal third-party software maintenance burden.
* To be able to co-exist with the existing network stack and ethernet virtual
devices in the event that an applications specific requirements cannot be
met by this design.
DESIGN
------
The Berkeley sockets coprocessor is a virtual PCI device which has the ability
to offload socket activity from an unmodified application at the BSD sockets
layer (Layer 4). Offloaded socket requests bypass the local operating systems
networking stack entirely via the card and are relayed into the VMM
(Virtual Machine Manager) for processing. The VMM then passes the request to a
socket backend for handling. The difference between a socket backend and a
traditional VM ethernet backend is that the socket backend receives layer 4
socket (STREAM/DGRAM) requests instead of a multiplexed stream of layer 2
packets (ethernet) that must be interpreted by the host. This technique also
improves security isolation as the guest is no longer constructing packets which
are evaluated by the host or underlying network fabric; packet construction
happens in the host.
Lastly, pushing socket processing back into the host allows for host-side
control of the network protocols used, which limits the potential congestion
problems that can arise when various guests are using their own congestion
control algorithms.
================================================================================
+-----------------------------------------------------------------+
| |
guest | unmodified application |
userspace +-----------------------------------------------------------------+
| unmodified libc |
+-----------------------------------------------------------------+
| / \
| |
=========================== | ============================ | ===================
| |
\ / |
+------------------------------------------------------+
| socket core |
+----+============+------------------------------------+
| INET | | / \
guest +-----+------+ | |
kernel | TCP | UDP | | |
+-----+------+ | L4 reqs |
| NETDEV | | |
+------------+ | |
| virtio_net | \ / |
+------------+ +------------------+
| / \ | hw_socket |
| | +------------------+
| | | virtio_socket |
| | +------------------+
| | | / \
========================= | == | ====================== | ====== | =============
\ / | \ / |
host +---------------------+ +------------------------+
userspace | virito net device | | virtio socket device |
(vmm) +---------------------+ +------------------------+
| ethernet backend | | socket backend |
+---------------------+ +------------------------+
| / \ | / \
L2 | | | | L4
packets | | \ / | requests
| | +-----------------------+
| | | Socket Handlers |
| | +-----------------------+
| | | / \
======================= | ==== | ===================== | ======= | =============
| | | |
host \ / | \ / |
kernel
================================================================================
One of the most appealing aspects of this design (to application developers) is
that this approach can be completely transparent to the application, provided
we're able to intercept the application's socket requests in such a way that we
do not impact performance in a negative fashion, yet retain the API semantics
the application expects. In the event that this design is not suitable for an
application, the virtual machine may be also fitted with a normal virtual
ethernet device in addition to the co-processor (as shown in the diagram above).
Since we wish to allow these paravirtualized sockets to coexist peacefully with
the existing Linux socket system, we've chosen to introduce the idea that a
socket can at some point transition from being managed by the O/S socket system
to a more enlightened 'hardware assisted' socket. The transition is managed by
a 'socket coprocessor' component which intercepts and gets first right of
refusal on handling certain global socket calls (connect, sendto, bind, etc...).
In this initial design, the policy on whether to transition a socket or not is
made by the virtual hardware, although we understand that further measurement
into operation latency is warranted.
In the event the determination is made to transition a socket to hw-assisted
mode, the socket is marked as being assisted by hardware, and all socket
operations are offloaded to hardware.
The following flag values have been added to struct socket (only visible within
the guest kernel):
* SOCK_HWASSIST
Indicates socket operations are handled by hardware
In order to support a variety of socket address families, addresses are
converted from their native socket family to an opaque string. Our initial
design formats these strings as URIs. The currently supported conversions are:
+-----------------------------------------------------------------------------+
| Domain | Type | URI example conversion |
| AF_INET | SOCK_STREAM | tcp://x.x.x.x:yyyy |
| AF_INET | SOCK_DGRAM | udp://x.x.x.x:yyyy |
| AF_INET6 | SOCK_STREAM | tcp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii |
| AF_INET6 | SOCK_DGRAM | udp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii |
| AF_IPX | SOCK_DGRAM | ipx://xxxxxxxx.yyyyyyyyyy.zzzz |
+-----------------------------------------------------------------------------+
In order for the socket coprocessor to take control of a socket, hooks must be
added to the socket core. Our initial implementation hooks a number of functions
in the socket-core (too many), and after consideration we feel we can reduce it
down considerably by managing the socket 'ops' pointers.
ALTERNATIVE STRATEGIES
----------------------
An alternative strategy for providing similar functionality involves either
modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We were
forced to rule this out due to the complexity (and fragility) involved with
attempting to maintain a general solution compatible accross various
distributions where platform-libraries differ.
CAVEATS
-------
* We're currently hooked into too many socket calls. We should be able to
reduce the number of hooks to 3 (__sock_create(), sys_connect(), sys_bind()).
* Our 'hw_socket' component should be folded into a netdev so we can leverage
NAPI.
* We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.
* We don't currently have support for /proc/net. Our current plan is to
add '/proc/net/hwsock' (filename TBD) and add support for these sockets
to the net-tools packages (netstat & friends), rather than muck around with
plumbing hardware-assisted socket info into '/proc/net/tcp' and
'/proc/net/udp'.
* We don't currently have SOCK_DGRAM support implemented (work in progress)
* We have insufficient integration testing in place (work in progress)
^ permalink raw reply
* Re:
From: San Mehat @ 2011-08-18 22:08 UTC (permalink / raw)
To: davem, mst, rusty
Cc: linux-kernel, virtualization, netdev, digitaleric, mikew, miche,
maccarro
In-Reply-To: <20110818220732.459185C80B@san.sea.corp.google.com>
Pls disregard in favor of the one with an actual subject line :P
-san
On Thu, Aug 18, 2011 at 3:07 PM, San Mehat <san@google.com> wrote:
>
> TL;DR
> -----
> In this RFC we propose the introduction of the concept of hardware socket
> offload to the Linux kernel. Patches will accompany this RFC in a few days,
> but we felt we had enough on the design to solicit constructive discussion
> from the community at-large.
>
> BACKGROUND
> ----------
> Many applications within enterprise organizations suitable for virtualization
> neither require nor desire a connection to the full internal Ethernet+IP
> network. Rather, some specific socket connections -- for processing HTTP
> requests, making database queries, or interacting with storage -- are needed,
> and IP networking in the application may typically be discouraged for
> applications that do not sit on the edge of the network. Furthermore, removing
> the application's need to understand where its inputs come from / go to within
> the networking fabric can make save/restore/migration of a virtualized
> application substantially easier - especially in large clusters and on fabrics
> which can't handle IP re-assignment.
>
> REQUIREMENTS
> ------------
> * Allow VM connectivity to internal resources without requiring additional
> network resources (IPs, VLANs, etc).
> * Easy authentication of network streams from a trusted domain (vmm).
> * Protect host-kernel & network-fabric from direct exposure to untrusted
> packet data-structures.
> * Support for multiple distributions of Linux.
> * Minimal third-party software maintenance burden.
> * To be able to co-exist with the existing network stack and ethernet virtual
> devices in the event that an applications specific requirements cannot be
> met by this design.
>
> DESIGN
> ------
> The Berkeley sockets coprocessor is a virtual PCI device which has the ability
> to offload socket activity from an unmodified application at the BSD sockets
> layer (Layer 4). Offloaded socket requests bypass the local operating systems
> networking stack entirely via the card and are relayed into the VMM
> (Virtual Machine Manager) for processing. The VMM then passes the request to a
> socket backend for handling. The difference between a socket backend and a
> traditional VM ethernet backend is that the socket backend receives layer 4
> socket (STREAM/DGRAM) requests instead of a multiplexed stream of layer 2
> packets (ethernet) that must be interpreted by the host. This technique also
> improves security isolation as the guest is no longer constructing packets which
> are evaluated by the host or underlying network fabric; packet construction
> happens in the host.
>
> Lastly, pushing socket processing back into the host allows for host-side
> control of the network protocols used, which limits the potential congestion
> problems that can arise when various guests are using their own congestion
> control algorithms.
>
> ================================================================================
>
> +-----------------------------------------------------------------+
> | |
> guest | unmodified application |
> userspace +-----------------------------------------------------------------+
> | unmodified libc |
> +-----------------------------------------------------------------+
> | / \
> | |
> =========================== | ============================ | ===================
> | |
> \ / |
> +------------------------------------------------------+
> | socket core |
> +----+============+------------------------------------+
> | INET | | / \
> guest +-----+------+ | |
> kernel | TCP | UDP | | |
> +-----+------+ | L4 reqs |
> | NETDEV | | |
> +------------+ | |
> | virtio_net | \ / |
> +------------+ +------------------+
> | / \ | hw_socket |
> | | +------------------+
> | | | virtio_socket |
> | | +------------------+
> | | | / \
> ========================= | == | ====================== | ====== | =============
> \ / | \ / |
> host +---------------------+ +------------------------+
> userspace | virito net device | | virtio socket device |
> (vmm) +---------------------+ +------------------------+
> | ethernet backend | | socket backend |
> +---------------------+ +------------------------+
> | / \ | / \
> L2 | | | | L4
> packets | | \ / | requests
> | | +-----------------------+
> | | | Socket Handlers |
> | | +-----------------------+
> | | | / \
> ======================= | ==== | ===================== | ======= | =============
> | | | |
> host \ / | \ / |
> kernel
>
> ================================================================================
>
> One of the most appealing aspects of this design (to application developers) is
> that this approach can be completely transparent to the application, provided
> we're able to intercept the application's socket requests in such a way that we
> do not impact performance in a negative fashion, yet retain the API semantics
> the application expects. In the event that this design is not suitable for an
> application, the virtual machine may be also fitted with a normal virtual
> ethernet device in addition to the co-processor (as shown in the diagram above).
>
> Since we wish to allow these paravirtualized sockets to coexist peacefully with
> the existing Linux socket system, we've chosen to introduce the idea that a
> socket can at some point transition from being managed by the O/S socket system
> to a more enlightened 'hardware assisted' socket. The transition is managed by
> a 'socket coprocessor' component which intercepts and gets first right of
> refusal on handling certain global socket calls (connect, sendto, bind, etc...).
> In this initial design, the policy on whether to transition a socket or not is
> made by the virtual hardware, although we understand that further measurement
> into operation latency is warranted.
>
> In the event the determination is made to transition a socket to hw-assisted
> mode, the socket is marked as being assisted by hardware, and all socket
> operations are offloaded to hardware.
>
> The following flag values have been added to struct socket (only visible within
> the guest kernel):
>
> * SOCK_HWASSIST
> Indicates socket operations are handled by hardware
>
> In order to support a variety of socket address families, addresses are
> converted from their native socket family to an opaque string. Our initial
> design formats these strings as URIs. The currently supported conversions are:
>
> +-----------------------------------------------------------------------------+
> | Domain | Type | URI example conversion |
> | AF_INET | SOCK_STREAM | tcp://x.x.x.x:yyyy |
> | AF_INET | SOCK_DGRAM | udp://x.x.x.x:yyyy |
> | AF_INET6 | SOCK_STREAM | tcp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii |
> | AF_INET6 | SOCK_DGRAM | udp6://aaaa:b:cccc:d:eeee:ffff:gggg:hhhh/ii |
> | AF_IPX | SOCK_DGRAM | ipx://xxxxxxxx.yyyyyyyyyy.zzzz |
> +-----------------------------------------------------------------------------+
>
> In order for the socket coprocessor to take control of a socket, hooks must be
> added to the socket core. Our initial implementation hooks a number of functions
> in the socket-core (too many), and after consideration we feel we can reduce it
> down considerably by managing the socket 'ops' pointers.
>
> ALTERNATIVE STRATEGIES
> ----------------------
>
> An alternative strategy for providing similar functionality involves either
> modifying glibc or using LD_PRELOAD tricks to intercept socket calls. We were
> forced to rule this out due to the complexity (and fragility) involved with
> attempting to maintain a general solution compatible accross various
> distributions where platform-libraries differ.
>
> CAVEATS
> -------
>
> * We're currently hooked into too many socket calls. We should be able to
> reduce the number of hooks to 3 (__sock_create(), sys_connect(), sys_bind()).
>
> * Our 'hw_socket' component should be folded into a netdev so we can leverage
> NAPI.
>
> * We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.
>
> * We don't currently have support for /proc/net. Our current plan is to
> add '/proc/net/hwsock' (filename TBD) and add support for these sockets
> to the net-tools packages (netstat & friends), rather than muck around with
> plumbing hardware-assisted socket info into '/proc/net/tcp' and
> '/proc/net/udp'.
>
> * We don't currently have SOCK_DGRAM support implemented (work in progress)
>
> * We have insufficient integration testing in place (work in progress)
>
--
San Mehat | Staff Software Engineer | san@google.com | 415-366-6172
^ permalink raw reply
* Re: [RFC 0/0] Introducing a generic socket offload framework
From: Alan Cox @ 2011-08-18 22:57 UTC (permalink / raw)
To: San Mehat
Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
digitaleric, mikew, miche, maccarro
In-Reply-To: <20110818220756.5C93E5C80B@san.sea.corp.google.com>
> The Berkeley sockets coprocessor is a virtual PCI device which has the ability
> to offload socket activity from an unmodified application at the BSD sockets
Ok I think there is an important question here. Why is this being
designed for a specific virtual interface. Unix has always had the notion
that socket operations can be in part generic and that you can pass a
properly designed program a socket without any notion of what it is for.
> Lastly, pushing socket processing back into the host allows for host-side
> control of the network protocols used, which limits the potential congestion
> problems that can arise when various guests are using their own congestion
> control algorithms.
Does that not depend which side does the congestion and who parcels out
buffers ?
> Since we wish to allow these paravirtualized sockets to coexist peacefully with
> the existing Linux socket system, we've chosen to introduce the idea that a
> socket can at some point transition from being managed by the O/S socket system
> to a more enlightened 'hardware assisted' socket. The transition is managed by
> a 'socket coprocessor' component which intercepts and gets first right of
> refusal on handling certain global socket calls (connect, sendto, bind, etc...).
> In this initial design, the policy on whether to transition a socket or not is
> made by the virtual hardware, although we understand that further measurement
> into operation latency is warranted.
Q: whay happens about in process socket syscalls in another thread ?
Thats always been the ugly in these cases either by intercepting or by
swapping file operations on an object.
> * SOCK_HWASSIST
> Indicates socket operations are handled by hardware
This guest only view means you can't use the abstraction for local
sockets too.
> In order to support a variety of socket address families, addresses are
> converted from their native socket family to an opaque string. Our initial
> design formats these strings as URIs. The currently supported conversions are:
That makes a lot of sense to me, because its a well understood
abstraction and you can offload other stuff to this kind of generic
socket including things like http protocol acceleration, SSL and so on.
Plus its always been annoying that you can't open a socket, but a URI
interface solves that...
> * We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.
But there is no reason SEQPACKET and RDM couldn't be added I assume?
Ok other questions
Suppose instead you just add an abstracted socket interface of
AF_SOMETHING, PF_URI
it would be easy to convert programs. It would be easier to write
properly generic programs. It would be easy write some small helpers that
are a good deal less insane than the existing inet ones. At that point
you could turn the problem on its head. Instead of 'borrowing' sockets
for a fairly specific concept of hw assist you ask the reverse question,
who can accelerate this URI be it some kind of virtual machine interface,
something funky like raw data over infiniband, or plain old 'use the
TCP/IP stack'.
Your decision making code is going to be interesting but it only has to
make the decision once in simple cases.
And yes there is still the complicated cases such as 'the routing table
has changed from vitual host to via siberia now what' but I don't believe
your proposal addresses that either.
Alan
^ permalink raw reply
* Re: [RFC 0/0] Introducing a generic socket offload framework
From: Alan Cox @ 2011-08-18 23:03 UTC (permalink / raw)
To: Alan Cox
Cc: San Mehat, davem, mst, rusty, linux-kernel, virtualization,
netdev, digitaleric, mikew, miche, maccarro
In-Reply-To: <20110818235719.50365b0b@lxorguk.ukuu.org.uk>
> Q: whay happens about in process socket syscalls in another thread ?
> Thats always been the ugly in these cases either by intercepting or by
> swapping file operations on an object.
Sorry I meant "in progress" 8)
^ permalink raw reply
* Re: Move interface across network namespaces
From: Eric W. Biederman @ 2011-08-18 23:12 UTC (permalink / raw)
To: Renato Westphal; +Cc: netdev, kaber, David Lamparter
In-Reply-To: <CAChaeg=1WSU0webhYDpQcnq3cJje17FQ5Z8rypD8sdkyzcLT-g@mail.gmail.com>
Renato Westphal <renatowestphal@gmail.com> writes:
> I forgot to mention that I'm using kernel v2.6.35 (with a lot of
> backports). For future reference, the commit 3b27e105550f7c4a ("netns:
> keep vlan slaves on master netns move", merged into v2.6.37-rc1) fixes
> this problem.
Which makes me silly as I now remember reviewing that patch.
>>>> * The target network namespace sends a RTM_NEWLINK netlink message
>>>> when an interface is moved to it. In the other hand, the source
>>>> network namespace doesn't sends a RTM_DELLINK message when an
>>>> interface is moved from it. This is very annoying because user space
>>>> applications (such as zebra) can't detect some interface moving
>>>> operations and then get into an inconsistent state. Anyone knows if
>>>> there's a workaround for this?
>>>
>>> Not getting RTM_DELLINK is a bug. The device registration and
>>> unregistration code has changed since dev_change_net_namespace was
>>> written and apparently one of the changes failed to update
>>> dev_change_net_namespace.
>>>
>>
>> Good, that makes a lot more sense. In the kernel 2.6.32.43 the
>> RTM_DELLINK netlink message is sent when a network interface is moved
>> from a network namespace. The same doesn't happens in the kernel
>> 2.6.35.13. I'll try to isolate the problem some more.
>
> Well, this regression was introduced by commit a2835763e130c343ac,
> which was merged into v2.6.34. Reverting parts of this commit makes
> the problem go away but breaks the support of "specifying device flags
> during device creation". I don't know the best way to fix this... any
> ideas?
Everything going through dev_change_net_namespace already needs to be
in the initialized state. So it looks like we just need to do:
Does the patch below work for you?
Eric
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 17d67b5..bfbde69 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6108,6 +6108,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
+ rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
+
/*
* Flush the unicast and multicast chains
*/
^ permalink raw reply related
* Re: [RFC 0/0] Introducing a generic socket offload framework
From: San Mehat @ 2011-08-18 23:18 UTC (permalink / raw)
To: Alan Cox
Cc: davem, mst, rusty, linux-kernel, virtualization, netdev,
digitaleric, mikew, miche, maccarro
In-Reply-To: <20110818235719.50365b0b@lxorguk.ukuu.org.uk>
On Thu, Aug 18, 2011 at 3:57 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> The Berkeley sockets coprocessor is a virtual PCI device which has the ability
>> to offload socket activity from an unmodified application at the BSD sockets
>
> Ok I think there is an important question here. Why is this being
> designed for a specific virtual interface. Unix has always had the notion
> that socket operations can be in part generic and that you can pass a
> properly designed program a socket without any notion of what it is for.
Sorry Alan if I wasn't clear, but I'm not quite sure what you're asking...
If you're asking 'why have you only spec'ed out a virtual interface
for this' then
my answer would be 'but of course you could design this in real hardware and
have a proper driver :)'. If you'd prefer that I call that out
specifically I'm happy to do so.
I have no desire to change the 'genericness' of sockets.. just the
opposite - i wish to
introduce the notion that sockets (can be) completely generic (when
offloaded) as far as
the guest is concerned.
>
>> Lastly, pushing socket processing back into the host allows for host-side
>> control of the network protocols used, which limits the potential congestion
>> problems that can arise when various guests are using their own congestion
>> control algorithms.
>
> Does that not depend which side does the congestion and who parcels out
> buffers ?
It does, and it does.
>
>> Since we wish to allow these paravirtualized sockets to coexist peacefully with
>> the existing Linux socket system, we've chosen to introduce the idea that a
>> socket can at some point transition from being managed by the O/S socket system
>> to a more enlightened 'hardware assisted' socket. The transition is managed by
>> a 'socket coprocessor' component which intercepts and gets first right of
>> refusal on handling certain global socket calls (connect, sendto, bind, etc...).
>> In this initial design, the policy on whether to transition a socket or not is
>> made by the virtual hardware, although we understand that further measurement
>> into operation latency is warranted.
>
> Q: whay happens about in process socket syscalls in another thread ?
> Thats always been the ugly in these cases either by intercepting or by
> swapping file operations on an object.
>
>> * SOCK_HWASSIST
>> Indicates socket operations are handled by hardware
>
> This guest only view means you can't use the abstraction for local
> sockets too.
>
To be honest, the way we're attempting to integrate is in such a way
that you *could*
offload AF_LOCAL sockets... but that world gets a bit too much like
the 'Twilight Zone'
for my current linkings..
>> In order to support a variety of socket address families, addresses are
>> converted from their native socket family to an opaque string. Our initial
>> design formats these strings as URIs. The currently supported conversions are:
>
> That makes a lot of sense to me, because its a well understood
> abstraction and you can offload other stuff to this kind of generic
> socket including things like http protocol acceleration, SSL and so on.
>
> Plus its always been annoying that you can't open a socket, but a URI
> interface solves that...
Indeed.
>
>> * We don't handle SOCK_SEQPACKET, SOCK_RAW, SOCK_RDM, or SOCK_PACKET sockets.
>
> But there is no reason SEQPACKET and RDM couldn't be added I assume?
No reason I can think of - we just did not have a specific requirement
for it at the time.
>
> Ok other questions
>
> Suppose instead you just add an abstracted socket interface of
>
> AF_SOMETHING, PF_URI
Mike Waychison and I were saving the 'PF_URI' discussion for a future
date, but indeed
we're on the same wave-length :). Our initial requirements are for an
'extremely minimal
burden of support' on the userspace environments, so we decided to
open up a separate
discussion on PF_URI
>
> it would be easy to convert programs. It would be easier to write
> properly generic programs. It would be easy write some small helpers that
> are a good deal less insane than the existing inet ones. At that point
> you could turn the problem on its head. Instead of 'borrowing' sockets
> for a fairly specific concept of hw assist you ask the reverse question,
> who can accelerate this URI be it some kind of virtual machine interface,
> something funky like raw data over infiniband, or plain old 'use the
> TCP/IP stack'.
Completely agree.
>
> Your decision making code is going to be interesting but it only has to
> make the decision once in simple cases.
Yup.
>
> And yes there is still the complicated cases such as 'the routing table
> has changed from vitual host to via siberia now what' but I don't believe
> your proposal addresses that either.
Can you be more specific? If you mean solving the 'keeping your tcp connections
open to non virtual endpoints across a migration (or whatever)' then
no it doesn't :)
>
> Alan
>
Thanks man,
-san
--
San Mehat | Staff Software Engineer | san@google.com | 415-366-6172
^ permalink raw reply
* Re: [PATCH] PM: add macro to test for runtime PM events
From: Greg KH @ 2011-08-18 23:26 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Alan Stern, Linux-pm mailing list, USB list, netdev,
linux-bluetooth, linux-input, Takashi Iwai
In-Reply-To: <201108182252.10364.rjw@sisk.pl>
On Thu, Aug 18, 2011 at 10:52:10PM +0200, Rafael J. Wysocki wrote:
> Hi,
>
> On Thursday, August 18, 2011, Alan Stern wrote:
> > This patch (as1482) adds a macro for testing whether or not a
> > pm_message value represents an autosuspend or autoresume (i.e., a
> > runtime PM) event. Encapsulating this notion seems preferable to
> > open-coding the test all over the place.
> >
> > Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> >
> > ---
> >
> > This is a minor change in the PM API, but most of the affected files
> > are in the USB subsystem. Therefore either Rafael or Greg might prefer
> > to accept this patch.
>
> I can take the patch if Greg is fine with that.
Fine with me:
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
^ permalink raw reply
* CAN I TRUST U?
From: Karen Groeneweg @ 2011-08-18 23:27 UTC (permalink / raw)
I am Mr Ming Yang from Hang Seng Bank,Hong Kong.I have a business proposal of 18.6 Million Dollars for us,if you are interested please contact me:ming-yang2011@hotmail.com
--------------------------------------------------------------------------------
This information is directed in confidence solely to the person named above and may contain confidential and/or privileged material. This information may not otherwise be distributed, copied or disclosed. If you have received this e-mail in error, please notify the sender immediately via a return e-mail and destroy original message. Thank you for your cooperation.
^ permalink raw reply
* linux-next: manual merge of the wireless tree with the net tree
From: Stephen Rothwell @ 2011-08-19 0:59 UTC (permalink / raw)
To: John W. Linville
Cc: linux-next, linux-kernel, Jiri Pirko, David Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 441 bytes --]
Hi John,
Today's linux-next merge of the wireless tree got a conflict in
drivers/staging/ath6kl/os/linux/ar6000_drv.c between commit afc4b13df143
("net: remove use of ndo_set_multicast_list in drivers") from the net
tree and commit af2bf4b4ee58 ("staging: remove ath6kl") from the wireless
tree.
I just removed the file.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: linux-next: manual merge of the wireless tree with the net tree
From: John W. Linville @ 2011-08-19 1:03 UTC (permalink / raw)
To: Stephen Rothwell
Cc: linux-next, linux-kernel, Jiri Pirko, David Miller, netdev
In-Reply-To: <20110819105946.a46aa8a997f466d8a2e7f2f3@canb.auug.org.au>
On Fri, Aug 19, 2011 at 10:59:46AM +1000, Stephen Rothwell wrote:
> Hi John,
>
> Today's linux-next merge of the wireless tree got a conflict in
> drivers/staging/ath6kl/os/linux/ar6000_drv.c between commit afc4b13df143
> ("net: remove use of ndo_set_multicast_list in drivers") from the net
> tree and commit af2bf4b4ee58 ("staging: remove ath6kl") from the wireless
> tree.
>
> I just removed the file.
Cool, thanks. I imagine that any more "bombing runs" that touch
ath6kl in staging will be able to resolved in the same fashion.
Thanks!
John
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* Re: [PATCH] fix IBM EMAC driver after rename.
From: Tony Breeds @ 2011-08-19 1:21 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: Netdev List, David Miller, Jeff Kirsher
In-Reply-To: <4E4CB653.9080907@hartkopp.net>
On Thu, Aug 18, 2011 at 08:50:59AM +0200, Oliver Hartkopp wrote:
> What about renaming of newemac -> emac in this part of the Makefile?
Sure see version 2.
Yours Tony
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox