* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Ronen Hod @ 2012-07-03 14:22 UTC (permalink / raw)
To: dlaor
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Christoph Hellwig
In-Reply-To: <4FDF0DA7.40604@redhat.com>
On 06/18/2012 02:14 PM, Dor Laor wrote:
> On 06/18/2012 01:05 PM, Rusty Russell wrote:
>> On Mon, 18 Jun 2012 16:03:23 +0800, Asias He<asias@redhat.com> wrote:
>>> On 06/18/2012 03:46 PM, Rusty Russell wrote:
>>>> On Mon, 18 Jun 2012 14:53:10 +0800, Asias He<asias@redhat.com> wrote:
>>>>> This patch introduces bio-based IO path for virtio-blk.
>>>>
>>>> Why make it optional?
>>>
>>> request-based IO path is useful for users who do not want to bypass the
>>> IO scheduler in guest kernel, e.g. users using spinning disk. For users
>>> using fast disk device, e.g. SSD device, they can use bio-based IO path.
>>
>> Users using a spinning disk still get IO scheduling in the host though.
>> What benefit is there in doing it in the guest as well?
>
> The io scheduler waits for requests to merge and thus batch IOs together. It's not important w.r.t spinning disks since the host can do it but it causes much less vmexits which is the key issue for VMs.
Does it make sense to use the guest's I/O scheduler at all?
- It is not aware of the physical (spinning) disk layout.
- It is not aware of all the host's disk pending requests.
It does have a good side-effect - batching of requests.
Ronen.
>
>>
>> Cheers,
>> Rusty.
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Dor Laor @ 2012-07-03 14:28 UTC (permalink / raw)
To: Ronen Hod
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Christoph Hellwig
In-Reply-To: <4FF3001C.9020706@redhat.com>
On 07/03/2012 05:22 PM, Ronen Hod wrote:
> On 06/18/2012 02:14 PM, Dor Laor wrote:
>> On 06/18/2012 01:05 PM, Rusty Russell wrote:
>>> On Mon, 18 Jun 2012 16:03:23 +0800, Asias He<asias@redhat.com> wrote:
>>>> On 06/18/2012 03:46 PM, Rusty Russell wrote:
>>>>> On Mon, 18 Jun 2012 14:53:10 +0800, Asias He<asias@redhat.com> wrote:
>>>>>> This patch introduces bio-based IO path for virtio-blk.
>>>>>
>>>>> Why make it optional?
>>>>
>>>> request-based IO path is useful for users who do not want to bypass the
>>>> IO scheduler in guest kernel, e.g. users using spinning disk. For users
>>>> using fast disk device, e.g. SSD device, they can use bio-based IO
>>>> path.
>>>
>>> Users using a spinning disk still get IO scheduling in the host though.
>>> What benefit is there in doing it in the guest as well?
>>
>> The io scheduler waits for requests to merge and thus batch IOs
>> together. It's not important w.r.t spinning disks since the host can
>> do it but it causes much less vmexits which is the key issue for VMs.
>
> Does it make sense to use the guest's I/O scheduler at all?
That's the reason we have a noop io scheduler.
> - It is not aware of the physical (spinning) disk layout.
> - It is not aware of all the host's disk pending requests.
> It does have a good side-effect - batching of requests.
>
> Ronen.
>
>>
>>>
>>> Cheers,
>>> Rusty.
>>> _______________________________________________
>>> Virtualization mailing list
>>> Virtualization@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-07-03 15:24 UTC (permalink / raw)
To: Ben Hutchings
Cc: Olaf Hering, Greg KH, apw@canonical.com,
devel@linuxdriverproject.org, virtualization@lists.osdl.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20120702195721.GE1894@decadent.org.uk>
> -----Original Message-----
> From: Ben Hutchings [mailto:ben@decadent.org.uk]
> Sent: Monday, July 02, 2012 3:57 PM
> To: KY Srinivasan
> Cc: Olaf Hering; Greg KH; apw@canonical.com; devel@linuxdriverproject.org;
> virtualization@lists.osdl.org; linux-kernel@vger.kernel.org;
> netdev@vger.kernel.org
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
>
> On Mon, Jul 02, 2012 at 03:22:25PM +0000, KY Srinivasan wrote:
> >
> >
> > > -----Original Message-----
> > > From: Olaf Hering [mailto:olaf@aepfle.de]
> > > Sent: Thursday, June 28, 2012 10:24 AM
> > > To: KY Srinivasan
> > > Cc: Greg KH; apw@canonical.com; devel@linuxdriverproject.org;
> > > virtualization@lists.osdl.org; linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH 00/13] drivers: hv: kvp
> > >
> > > On Tue, Jun 26, KY Srinivasan wrote:
> > >
> > > > > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > > > > The fact that it was Red Hat specific was the main part, this should be
> > > > > done in a standard way, with standard tools, right?
> > > >
> > > > The reason I asked this question was to make sure I address these
> > > > issues in addition to whatever I am debugging now. I use the standard
> > > > tools and calls to retrieve all the IP configuration. As I look at
> > > > each distribution the files they keep persistent IP configuration
> > > > Information is different and that is the reason I chose to start with
> > > > RedHat. If there is a standard way to store the configuration, I will
> > > > do that.
> > >
> > >
> > > KY,
> > >
> > > instead of using system() in kvp_get_ipconfig_info and kvp_set_ip_info,
> > > wouldnt it be easier to call an external helper script which does all
> > > the distribution specific work? Just define some API to pass values to
> > > the script, and something to read values collected by the script back
> > > into the daemon.
> >
> > On the "Get" side I mostly use standard commands/APIs to get all the
> information:
> >
> > 1) IP address information and subnet mask: getifaddrs()
> > 2) DNS information: Parsing /etc/resolv.conf
> > 3) /sbin/ip command for all the routing information
>
> If you're interested in the *current* configuration then (1) and (3)
> are OK but you should really use the rtnetlink API.
>
> However, I suspect that Hyper-V assumes that current and persistent
> configuration are the same thing, which is obviously not true in
> general on Linux. But if NetworkManager is running then you can
> assume they are.
I am only interested in the currently active information. Why do you
recommend the use of rtnetlink API over the "ip" command. If I am not
mistaken, the ip command uses netlink to get the information.
>
> > 4) Parse /etc/sysconfig/network-scripts/ifcfg-ethx for boot protocol
This is the only information that requires parsing a distro specific configuration file. Do
you have any suggestion on how I may get this information in a distro independent way.
> >
> > As you can see, all but the boot protocol is gathered using the "standard distro
> > independent mechanisms. I was looking at NetworkManager cli and it looks
> > like I could gather all the information except the boot protocol information. I am
> > not sure how to gather the boot protocol information in a distro independent
> fashion.
> >
> > On the SET side, I need to persistently store the settings in an appropriate
> configuration
> > file and flush these settings down so that the interface is appropriately
> configured. It is here
> > that I am struggling to find a distro independent way of doing things. It would
> be great if I can
> > use NetworkManager cli (nmcli) to accomplish this. Any help here would be
> greatly appreciated.
> [...]
>
> What was wrong with the NetworkManager D-Bus API I pointed you at?
> I don't see how it makes sense to use nmcli as an API.
I saw some documentation that claimed that nmcli could be used to accomplish
all that can be done with the GUI interface. I am looking for a portable way
to accomplish configuring an interface. If nmcli can do that, I would use it. With
regards to D-BUS API, I took a cursory look at the APIs. I am still evaluating
my options.
Regards,
K. Y
^ permalink raw reply
* Re: RFD: virtio balloon API use (was Re: [PATCH 5 of 5] virtio: expose added descriptors immediately)
From: Rafael Aquini @ 2012-07-03 16:26 UTC (permalink / raw)
To: Rusty Russell; +Cc: virtualization, linux-kernel, kvm, Michael S. Tsirkin
In-Reply-To: <87vci5wtzh.fsf@rustcorp.com.au>
On Tue, Jul 03, 2012 at 10:17:46AM +0930, Rusty Russell wrote:
> On Mon, 2 Jul 2012 13:08:19 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> > As 'locking in balloon', may I assume the approach I took for the compaction case
> > is OK and aligned to address these concerns of yours? If not, do not hesitate in
> > giving me your thoughts, please. I'm respinning a V3 series to address a couple
> > of extra nitpicks from the compaction standpoint, and I'd love to be able to
> > address any extra concern you might have on the balloon side of that work.
>
> It's orthogonal, though looks like they clash textually :(
>
> I'll re-spin MST's patch on top of yours, and include both in my tree,
> otherwise linux-next will have to do the merge. But I'll await your
> push before pushing to Linus next merge window.
>
Thanks, Rusty.
I'll post V3 series quite soon.
Cheers!
Rafael
> Thanks,
> Rusty.
^ permalink raw reply
* Re: [PATCH v2 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Rafael Aquini @ 2012-07-03 18:31 UTC (permalink / raw)
To: Minchan Kim
Cc: Rik van Riel, Michael S. Tsirkin, Konrad Rzeszutek Wilk,
linux-kernel, virtualization, linux-mm, Andi Kleen, Andrew Morton
In-Reply-To: <4FF0DEE2.5080200@kernel.org>
On Mon, Jul 02, 2012 at 08:36:02AM +0900, Minchan Kim wrote:
> On 06/30/2012 10:34 AM, Rafael Aquini wrote:
>
> >> void isolate_page_from_balloonlist(struct page* page)
> >> > {
> >> > page->mapping->a_ops->invalidatepage(page, 0);
> >> > }
> >> >
> >> > if (is_balloon_page(page) && (page_count(page) == 2)) {
> >> > isolate_page_from_balloonlist(page);
> >> > }
> >> >
> > Humm, my feelings on your approach here: just an unecessary indirection that
> > doesn't bring the desired code readability improvement.
> > If the header comment statement on balloon_mapping->a_ops is not clear enough
> > on those methods usage for ballooned pages:
> >
> > .....
> > /*
> > * Balloon pages special page->mapping.
> > * users must properly allocate and initialize an instance of balloon_mapping,
> > * and set it as the page->mapping for balloon enlisted page instances.
> > *
> > * address_space_operations necessary methods for ballooned pages:
> > * .migratepage - used to perform balloon's page migration (as is)
> > * .invalidatepage - used to isolate a page from balloon's page list
> > * .freepage - used to reinsert an isolated page to balloon's page list
> > */
> > struct address_space *balloon_mapping;
> > EXPORT_SYMBOL_GPL(balloon_mapping);
> > .....
> >
> > I can add an extra commentary, to recollect folks about that usage, next to the
> > points where those callbacks are used at isolate_balloon_page() &
> > putback_balloon_page(). What do you think?
> >
> >
>
>
> I am not strongly against you.
> It trivial nitpick must not prevent your great work. :)
>
> Thanks!
>
>
Nah, I'm the one who should be thanking everyone else here. :)
After a second thought I decided to follow your suggestion on this one as well.
Soon, I'll be posting the re-spin
Thanks Minchan!
Rafael
^ permalink raw reply
* [PATCH v3 0/4] make balloon pages movable by compaction
From: Rafael Aquini @ 2012-07-03 23:48 UTC (permalink / raw)
To: linux-mm
Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
Michael S. Tsirkin, linux-kernel, virtualization, Minchan Kim,
Andi Kleen, Andrew Morton
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.
This patchset follows the main idea discussed at 2012 LSFMMS section:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
changes to the core compaction & migration bits, in order to make those
subsystems aware of ballooned pages and allow memory balloon pages become
movable within a guest, thus avoiding the aforementioned fragmentation issue.
Rafael Aquini (4):
mm: introduce compaction and migration for virtio ballooned pages
virtio_balloon: handle concurrent accesses to virtio_balloon struct
elements
virtio_balloon: introduce migration primitives to balloon pages
mm: add vm event counters for balloon pages compaction
drivers/virtio/virtio_balloon.c | 142 +++++++++++++++++++++++++++++++++++----
include/linux/mm.h | 15 +++++
include/linux/virtio_balloon.h | 4 ++
include/linux/vm_event_item.h | 2 +
mm/compaction.c | 127 ++++++++++++++++++++++++++++------
mm/migrate.c | 32 ++++++++-
mm/vmstat.c | 4 ++
7 files changed, 293 insertions(+), 33 deletions(-)
Changelog:
V3: address reviwers nitpick suggestions (Mel, Minchan)
V2: address Mel Gorman's review comments
Preliminary test results:
(2 VCPU 1024mB RAM KVM guest running 3.5.0_rc5+)
* 64mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 4); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null
[1] Done echo 1 > /proc/sys/vm/compact_memory
[2] Done echo 1 > /proc/sys/vm/compact_memory
[3]- Done echo 1 > /proc/sys/vm/compact_memory
[4]+ Done echo 1 > /proc/sys/vm/compact_memory
[root@localhost ~]#
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 2717
compact_pages_moved 46697
compact_pagemigrate_failed 75
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 16384
compact_balloon_failed 0
compact_balloon_isolated 16384
compact_balloon_freed 16384
* 128mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 4); do echo 1 > /proc/sys/vm/compact_memory & done &>/dev/null
[1] Done echo 1 > /proc/sys/vm/compact_memory
[2] Done echo 1 > /proc/sys/vm/compact_memory
[3]- Done echo 1 > /proc/sys/vm/compact_memory
[4]+ Done echo 1 > /proc/sys/vm/compact_memory
[root@localhost ~]#
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 2598
compact_pages_moved 47660
compact_pagemigrate_failed 103
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 26652
compact_balloon_failed 76
compact_balloon_isolated 26728
compact_balloon_freed 26652
--
1.7.10.2
^ permalink raw reply
* [PATCH v3 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Rafael Aquini @ 2012-07-03 23:48 UTC (permalink / raw)
To: linux-mm
Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
Michael S. Tsirkin, linux-kernel, virtualization, Minchan Kim,
Andi Kleen, Andrew Morton
In-Reply-To: <cover.1341353014.git.aquini@redhat.com>
This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
include/linux/mm.h | 15 +++++++
mm/compaction.c | 126 ++++++++++++++++++++++++++++++++++++++++++++--------
mm/migrate.c | 30 ++++++++++++-
3 files changed, 151 insertions(+), 20 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..3112198 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1629,5 +1629,20 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; }
static inline bool page_is_guard(struct page *page) { return false; }
#endif /* CONFIG_DEBUG_PAGEALLOC */
+#if (defined(CONFIG_VIRTIO_BALLOON) || \
+ defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
+extern bool putback_balloon_page(struct page *);
+extern struct address_space *balloon_mapping;
+
+static inline bool is_balloon_page(struct page *page)
+{
+ return (page->mapping == balloon_mapping) ? true : false;
+}
+#else
+static inline bool is_balloon_page(struct page *page) { return false; }
+static inline bool isolate_balloon_page(struct page *page) { return false; }
+static inline bool putback_balloon_page(struct page *page) { return false; }
+#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && COMPACTION */
+
#endif /* __KERNEL__ */
#endif /* _LINUX_MM_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..887d0fc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
#include <linux/backing-dev.h>
#include <linux/sysctl.h>
#include <linux/sysfs.h>
+#include <linux/export.h>
#include "internal.h"
#if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -21,6 +22,85 @@
#define CREATE_TRACE_POINTS
#include <trace/events/compaction.h>
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+/*
+ * Balloon pages special page->mapping.
+ * Users must properly allocate and initialize an instance of balloon_mapping,
+ * and set it as the page->mapping for balloon enlisted page instances.
+ * There is no need on utilizing struct address_space locking schemes for
+ * balloon_mapping as, once it gets initialized at balloon driver, it will
+ * remain just like a static reference that helps us on identifying a guest
+ * ballooned page by its mapping, as well as it will keep the 'a_ops' callback
+ * pointers to the functions that will execute the balloon page mobility tasks.
+ *
+ * address_space_operations necessary methods for ballooned pages:
+ * .migratepage - used to perform balloon's page migration (as is)
+ * .invalidatepage - used to isolate a page from balloon's page list
+ * .freepage - used to reinsert an isolated page to balloon's page list
+ */
+struct address_space *balloon_mapping;
+EXPORT_SYMBOL_GPL(balloon_mapping);
+
+static inline void __isolate_balloon_page(struct page *page)
+{
+ page->mapping->a_ops->invalidatepage(page, 0);
+}
+
+static inline void __putback_balloon_page(struct page *page)
+{
+ page->mapping->a_ops->freepage(page);
+}
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+static bool isolate_balloon_page(struct page *page)
+{
+ if (WARN_ON(!is_balloon_page(page)))
+ return false;
+
+ if (likely(get_page_unless_zero(page))) {
+ /*
+ * We can race against move_to_new_page() & __unmap_and_move().
+ * If we stumble across a locked balloon page and succeed on
+ * isolating it, the result tends to be disastrous.
+ */
+ if (likely(trylock_page(page))) {
+ /*
+ * A ballooned page, by default, has just one refcount.
+ * Prevent concurrent compaction threads from isolating
+ * an already isolated balloon page.
+ */
+ if (is_balloon_page(page) && (page_count(page) == 2)) {
+ __isolate_balloon_page(page);
+ unlock_page(page);
+ return true;
+ }
+ unlock_page(page);
+ }
+ /* Drop refcount taken for this already isolated page */
+ put_page(page);
+ }
+ return false;
+}
+
+/* putback_lru_page() counterpart for a ballooned page */
+bool putback_balloon_page(struct page *page)
+{
+ if (WARN_ON(!is_balloon_page(page)))
+ return false;
+
+ if (likely(trylock_page(page))) {
+ if (is_balloon_page(page)) {
+ __putback_balloon_page(page);
+ put_page(page);
+ unlock_page(page);
+ return true;
+ }
+ unlock_page(page);
+ }
+ return false;
+}
+#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
+
static unsigned long release_freepages(struct list_head *freelist)
{
struct page *page, *next;
@@ -312,32 +392,40 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
continue;
}
- if (!PageLRU(page))
- continue;
-
/*
- * PageLRU is set, and lru_lock excludes isolation,
- * splitting and collapsing (collapsing has already
- * happened if PageLRU is set).
+ * It is possible to migrate LRU pages and balloon pages.
+ * Skip any other type of page.
*/
- if (PageTransHuge(page)) {
- low_pfn += (1 << compound_order(page)) - 1;
- continue;
- }
+ if (PageLRU(page)) {
+ /*
+ * PageLRU is set, and lru_lock excludes isolation,
+ * splitting and collapsing (collapsing has already
+ * happened if PageLRU is set).
+ */
+ if (PageTransHuge(page)) {
+ low_pfn += (1 << compound_order(page)) - 1;
+ continue;
+ }
- if (!cc->sync)
- mode |= ISOLATE_ASYNC_MIGRATE;
+ if (!cc->sync)
+ mode |= ISOLATE_ASYNC_MIGRATE;
- lruvec = mem_cgroup_page_lruvec(page, zone);
+ lruvec = mem_cgroup_page_lruvec(page, zone);
- /* Try isolate the page */
- if (__isolate_lru_page(page, mode) != 0)
- continue;
+ /* Try isolate the page */
+ if (__isolate_lru_page(page, mode) != 0)
+ continue;
+
+ VM_BUG_ON(PageTransCompound(page));
- VM_BUG_ON(PageTransCompound(page));
+ /* Successfully isolated */
+ del_page_from_lru_list(page, lruvec, page_lru(page));
+ } else if (is_balloon_page(page)) {
+ if (!isolate_balloon_page(page))
+ continue;
+ } else
+ continue;
- /* Successfully isolated */
- del_page_from_lru_list(page, lruvec, page_lru(page));
list_add(&page->lru, migratelist);
cc->nr_migratepages++;
nr_isolated++;
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..59c7bc5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -78,7 +78,10 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- putback_lru_page(page);
+ if (unlikely(is_balloon_page(page)))
+ WARN_ON(!putback_balloon_page(page));
+ else
+ putback_lru_page(page);
}
}
@@ -783,6 +786,17 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
}
}
+ if (is_balloon_page(page)) {
+ /*
+ * A ballooned page does not need any special attention from
+ * physical to virtual reverse mapping procedures.
+ * Skip any attempt to unmap PTEs or to remap swap cache,
+ * in order to avoid burning cycles at rmap level.
+ */
+ remap_swapcache = 0;
+ goto skip_unmap;
+ }
+
/*
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
@@ -852,6 +866,20 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
goto out;
rc = __unmap_and_move(page, newpage, force, offlining, mode);
+
+ if (is_balloon_page(newpage)) {
+ /*
+ * A ballooned page has been migrated already. Now, it is the
+ * time to wrap-up counters, handle the old page back to Buddy
+ * and return.
+ */
+ list_del(&page->lru);
+ dec_zone_page_state(page, NR_ISOLATED_ANON +
+ page_is_file_cache(page));
+ put_page(page);
+ __free_page(page);
+ return rc;
+ }
out:
if (rc != -EAGAIN) {
/*
--
1.7.10.4
^ permalink raw reply related
* [PATCH v3 2/4] virtio_balloon: handle concurrent accesses to virtio_balloon struct elements
From: Rafael Aquini @ 2012-07-03 23:48 UTC (permalink / raw)
To: linux-mm
Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
Michael S. Tsirkin, linux-kernel, virtualization, Minchan Kim,
Andi Kleen, Andrew Morton
In-Reply-To: <cover.1341353014.git.aquini@redhat.com>
This patch introduces access sychronization to critical elements of struct
virtio_balloon, in order to allow the thread concurrency compaction/migration
bits might ended up imposing to the balloon driver on several situations.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
drivers/virtio/virtio_balloon.c | 45 +++++++++++++++++++++++++++++----------
1 file changed, 34 insertions(+), 11 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index bfbc15c..d47c5c2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,6 +51,10 @@ struct virtio_balloon
/* Number of balloon pages we've told the Host we're not using. */
unsigned int num_pages;
+
+ /* Protect 'pages', 'pfns' & 'num_pnfs' against concurrent updates */
+ spinlock_t pfn_list_lock;
+
/*
* The pages we've told the Host we're not using.
* Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -97,21 +101,23 @@ static void balloon_ack(struct virtqueue *vq)
complete(&vb->acked);
}
-static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
-{
- struct scatterlist sg;
-
- sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+/* Protection for concurrent accesses to balloon virtqueues and vb->acked */
+DEFINE_MUTEX(vb_queue_completion);
+static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq,
+ struct scatterlist *sg)
+{
+ mutex_lock(&vb_queue_completion);
init_completion(&vb->acked);
/* We should always be able to add one buffer to an empty queue. */
- if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
+ if (virtqueue_add_buf(vq, sg, 1, 0, vb, GFP_KERNEL) < 0)
BUG();
virtqueue_kick(vq);
/* When host has read buffer, this completes via balloon_ack */
wait_for_completion(&vb->acked);
+ mutex_unlock(&vb_queue_completion);
}
static void set_page_pfns(u32 pfns[], struct page *page)
@@ -126,9 +132,12 @@ static void set_page_pfns(u32 pfns[], struct page *page)
static void fill_balloon(struct virtio_balloon *vb, size_t num)
{
+ struct scatterlist sg;
+ int alloc_failed = 0;
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
+ spin_lock(&vb->pfn_list_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
@@ -138,8 +147,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
dev_printk(KERN_INFO, &vb->vdev->dev,
"Out of puff! Can't get %zu pages\n",
num);
- /* Sleep for at least 1/5 of a second before retry. */
- msleep(200);
+ alloc_failed = 1;
break;
}
set_page_pfns(vb->pfns + vb->num_pfns, page);
@@ -149,10 +157,19 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
}
/* Didn't get any? Oh well. */
- if (vb->num_pfns == 0)
+ if (vb->num_pfns == 0) {
+ spin_unlock(&vb->pfn_list_lock);
return;
+ }
+
+ sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+ spin_unlock(&vb->pfn_list_lock);
- tell_host(vb, vb->inflate_vq);
+ /* alloc_page failed, sleep for at least 1/5 of a sec before retry. */
+ if (alloc_failed)
+ msleep(200);
+
+ tell_host(vb, vb->inflate_vq, &sg);
}
static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +186,12 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
static void leak_balloon(struct virtio_balloon *vb, size_t num)
{
struct page *page;
+ struct scatterlist sg;
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb->pfns));
+ spin_lock(&vb->pfn_list_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
page = list_first_entry(&vb->pages, struct page, lru);
@@ -180,13 +199,15 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
}
+ sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+ spin_unlock(&vb->pfn_list_lock);
/*
* Note that if
* virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
* is true, we *have* to do it in this order
*/
- tell_host(vb, vb->deflate_vq);
+ tell_host(vb, vb->deflate_vq, &sg);
release_pages_by_pfn(vb->pfns, vb->num_pfns);
}
@@ -356,6 +377,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
}
INIT_LIST_HEAD(&vb->pages);
+ spin_lock_init(&vb->pfn_list_lock);
+
vb->num_pages = 0;
init_waitqueue_head(&vb->config_change);
vb->vdev = vdev;
--
1.7.10.4
^ permalink raw reply related
* [PATCH v3 3/4] virtio_balloon: introduce migration primitives to balloon pages
From: Rafael Aquini @ 2012-07-03 23:48 UTC (permalink / raw)
To: linux-mm
Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
Michael S. Tsirkin, linux-kernel, virtualization, Minchan Kim,
Andi Kleen, Andrew Morton
In-Reply-To: <cover.1341353014.git.aquini@redhat.com>
This patch makes balloon pages movable at allocation time and introduces the
infrastructure needed to perform the balloon page migration operation.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
drivers/virtio/virtio_balloon.c | 96 ++++++++++++++++++++++++++++++++++++++-
include/linux/virtio_balloon.h | 4 ++
2 files changed, 98 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d47c5c2..53386aa 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,8 @@
#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
/*
* Balloon device works in 4K page units. So each page is pointed to by
@@ -140,8 +142,9 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
spin_lock(&vb->pfn_list_lock);
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
- struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
- __GFP_NOMEMALLOC | __GFP_NOWARN);
+ struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+ __GFP_NORETRY | __GFP_NOWARN |
+ __GFP_NOMEMALLOC);
if (!page) {
if (printk_ratelimit())
dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -154,6 +157,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
totalram_pages--;
list_add(&page->lru, &vb->pages);
+ page->mapping = balloon_mapping;
}
/* Didn't get any? Oh well. */
@@ -195,6 +199,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
for (vb->num_pfns = 0; vb->num_pfns < num;
vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
page = list_first_entry(&vb->pages, struct page, lru);
+ page->mapping = NULL;
list_del(&page->lru);
set_page_pfns(vb->pfns + vb->num_pfns, page);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
@@ -365,6 +370,77 @@ static int init_vqs(struct virtio_balloon *vb)
return 0;
}
+/*
+ * Populate balloon_mapping->a_ops->migratepage method to perform the balloon
+ * page migration task.
+ *
+ * After a ballooned page gets isolated by compaction procedures, this is the
+ * function that performs the page migration on behalf of move_to_new_page(),
+ * when the last calls (page)->mapping->a_ops->migratepage.
+ *
+ * Page migration for virtio balloon is done in a simple swap fashion which
+ * follows these two steps:
+ * 1) insert newpage into vb->pages list and update the host about it;
+ * 2) update the host about the removed old page from vb->pages list;
+ */
+int virtballoon_migratepage(struct address_space *mapping,
+ struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+ struct virtio_balloon *vb = (void *)mapping->backing_dev_info;
+ struct scatterlist sg;
+
+ /* balloon's page migration 1st step */
+ spin_lock(&vb->pfn_list_lock);
+ vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+ list_add(&newpage->lru, &vb->pages);
+ set_page_pfns(vb->pfns, newpage);
+ sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+ spin_unlock(&vb->pfn_list_lock);
+ tell_host(vb, vb->inflate_vq, &sg);
+
+ /* balloon's page migration 2nd step */
+ spin_lock(&vb->pfn_list_lock);
+ vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+ set_page_pfns(vb->pfns, page);
+ sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+ spin_unlock(&vb->pfn_list_lock);
+ tell_host(vb, vb->deflate_vq, &sg);
+
+ return 0;
+}
+
+/*
+ * Populate balloon_mapping->a_ops->invalidatepage method to help compaction on
+ * isolating a page from the balloon page list.
+ */
+void virtballoon_isolatepage(struct page *page, unsigned long mode)
+{
+ struct address_space *mapping = page->mapping;
+ struct virtio_balloon *vb = (void *)mapping->backing_dev_info;
+ spin_lock(&vb->pfn_list_lock);
+ list_del(&page->lru);
+ spin_unlock(&vb->pfn_list_lock);
+}
+
+/*
+ * Populate balloon_mapping->a_ops->freepage method to help compaction on
+ * re-inserting an isolated page into the balloon page list.
+ */
+void virtballoon_putbackpage(struct page *page)
+{
+ struct address_space *mapping = page->mapping;
+ struct virtio_balloon *vb = (void *)mapping->backing_dev_info;
+ spin_lock(&vb->pfn_list_lock);
+ list_add(&page->lru, &vb->pages);
+ spin_unlock(&vb->pfn_list_lock);
+}
+
+static const struct address_space_operations virtio_balloon_aops = {
+ .migratepage = virtballoon_migratepage,
+ .invalidatepage = virtballoon_isolatepage,
+ .freepage = virtballoon_putbackpage,
+};
+
static int virtballoon_probe(struct virtio_device *vdev)
{
struct virtio_balloon *vb;
@@ -384,6 +460,19 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->vdev = vdev;
vb->need_stats_update = 0;
+ /* Init the ballooned page->mapping special balloon_mapping */
+ balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
+ if (!balloon_mapping) {
+ err = -ENOMEM;
+ goto out_free_mapping;
+ }
+
+ INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
+ INIT_LIST_HEAD(&balloon_mapping->i_mmap_nonlinear);
+ spin_lock_init(&balloon_mapping->tree_lock);
+ balloon_mapping->a_ops = &virtio_balloon_aops;
+ balloon_mapping->backing_dev_info = (void *)vb;
+
err = init_vqs(vb);
if (err)
goto out_free_vb;
@@ -398,6 +487,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
out_del_vqs:
vdev->config->del_vqs(vdev);
+out_free_mapping:
+ kfree(balloon_mapping);
out_free_vb:
kfree(vb);
out:
@@ -424,6 +515,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
kthread_stop(vb->thread);
remove_common(vb);
kfree(vb);
+ kfree(balloon_mapping);
}
#ifdef CONFIG_PM
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index 652dc8b..930f1b7 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -56,4 +56,8 @@ struct virtio_balloon_stat {
u64 val;
} __attribute__((packed));
+#if !defined(CONFIG_COMPACTION)
+struct address_space *balloon_mapping;
+#endif
+
#endif /* _LINUX_VIRTIO_BALLOON_H */
--
1.7.10.4
^ permalink raw reply related
* [PATCH v3 4/4] mm: add vm event counters for balloon pages compaction
From: Rafael Aquini @ 2012-07-03 23:48 UTC (permalink / raw)
To: linux-mm
Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
Michael S. Tsirkin, linux-kernel, virtualization, Minchan Kim,
Andi Kleen, Andrew Morton
In-Reply-To: <cover.1341353014.git.aquini@redhat.com>
This patch is only for testing report purposes and shall be dropped in case of
the rest of this patchset getting accepted for merging.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
drivers/virtio/virtio_balloon.c | 1 +
include/linux/vm_event_item.h | 2 ++
mm/compaction.c | 1 +
mm/migrate.c | 6 ++++--
mm/vmstat.c | 4 ++++
5 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 53386aa..c4a929d 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -406,6 +406,7 @@ int virtballoon_migratepage(struct address_space *mapping,
spin_unlock(&vb->pfn_list_lock);
tell_host(vb, vb->deflate_vq, &sg);
+ count_vm_event(COMPACTBALLOONMIGRATED);
return 0;
}
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 06f8e38..e330c5a 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -40,6 +40,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_COMPACTION
COMPACTBLOCKS, COMPACTPAGES, COMPACTPAGEFAILED,
COMPACTSTALL, COMPACTFAIL, COMPACTSUCCESS,
+ COMPACTBALLOONMIGRATED, COMPACTBALLOONFAILED,
+ COMPACTBALLOONISOLATED, COMPACTBALLOONFREED,
#endif
#ifdef CONFIG_HUGETLB_PAGE
HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/compaction.c b/mm/compaction.c
index 887d0fc..8f7df01 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -72,6 +72,7 @@ static bool isolate_balloon_page(struct page *page)
if (is_balloon_page(page) && (page_count(page) == 2)) {
__isolate_balloon_page(page);
unlock_page(page);
+ count_vm_event(COMPACTBALLOONISOLATED);
return true;
}
unlock_page(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index 59c7bc5..5838719 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -78,9 +78,10 @@ void putback_lru_pages(struct list_head *l)
list_del(&page->lru);
dec_zone_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
- if (unlikely(is_balloon_page(page)))
+ if (unlikely(is_balloon_page(page))) {
+ count_vm_event(COMPACTBALLOONFAILED);
WARN_ON(!putback_balloon_page(page));
- else
+ } else
putback_lru_page(page);
}
}
@@ -878,6 +879,7 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
page_is_file_cache(page));
put_page(page);
__free_page(page);
+ count_vm_event(COMPACTBALLOONFREED);
return rc;
}
out:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 1bbbbd9..3b7109f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -767,6 +767,10 @@ const char * const vmstat_text[] = {
"compact_stall",
"compact_fail",
"compact_success",
+ "compact_balloon_migrated",
+ "compact_balloon_failed",
+ "compact_balloon_isolated",
+ "compact_balloon_freed",
#endif
#ifdef CONFIG_HUGETLB_PAGE
--
1.7.10.4
^ permalink raw reply related
* Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
From: Rusty Russell @ 2012-07-04 2:40 UTC (permalink / raw)
To: Asias He
Cc: kvm, Michael S. Tsirkin, linux-kernel, virtualization,
Sasha Levin, Christoph Hellwig
In-Reply-To: <4FF23F4B.2040803@redhat.com>
On Tue, 03 Jul 2012 08:39:39 +0800, Asias He <asias@redhat.com> wrote:
> On 07/02/2012 02:41 PM, Rusty Russell wrote:
> > Sure, our guest merging might save us 100x as many exits as no merging.
> > But since we're not doing many requests, does it matter?
>
> We can still have many requests with slow devices. The number of
> requests depends on the workload in guest. E.g. 512 IO threads in guest
> keeping doing IO.
You can have many requests outstanding. But if the device is slow, the
rate of requests being serviced must be low.
Am I misunderstanding something? I thought if you could have a high
rate of requests, it's not a slow device.
Cheers,
Rusty.
^ permalink raw reply
* Re: [PATCH RFC] virtio-balloon: fix add/get API use
From: Rusty Russell @ 2012-07-04 3:27 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtualization, Rafael Aquini, kvm, linux-kernel
In-Reply-To: <20120702073308.GB8268@redhat.com>
On Mon, 2 Jul 2012 10:33:08 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> In virtio balloon virtqueue_get_buf might now run concurrently with
> virtqueue_kick. I audited both and this seems safe in practice but
> this is not guaranteed by the API.
> Additionally, a spurious interrupt might in theory make
> virtqueue_get_buf run in parallel with virtqueue_add_buf, which is racy.
>
> While we might try to protect against spurious callbacks it's
> easier to fix the driver: balloon seems to be the only one
> (mis)using the API like this, so let's just fix balloon.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
I was thinking of a spinlock, but this is far more elegant.
And I added an explicit reference to the 'virtio: expose added
descriptors immediately.' commit in your commit msg.
Kudos!
Rusty.
^ permalink raw reply
* [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Nicholas A. Bellinger @ 2012-07-04 4:24 UTC (permalink / raw)
To: target-devel
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
Christoph Hellwig
From: Nicholas Bellinger <nab@linux-iscsi.org>
Hi folks,
This series contains patches required to update tcm_vhost <-> virtio-scsi
connected hosts <-> guests to run on v3.5-rc2 mainline code. This series is
available on top of target-pending/auto-next here:
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost
This includes the necessary vhost changes from Stefan to to get tcm_vhost
functioning, along a virtio-scsi LUN scanning change to address a client bug
with tcm_vhost I ran into.. Also, tcm_vhost driver has been merged into a single
source + header file that is now living under /drivers/vhost/, along with latest
tcm_vhost changes from Zhi's tcm_vhost tree.
Here are a couple of screenshots of the code in action using raw IBLOCK
backends provided by FusionIO ioDrive Duo:
http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png
So the next steps on my end will be converting tcm_vhost to submit backend I/O from
cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.
Please have a look vhost + virtio-scsi folks (mst + paolo CC'ed) and let us
know if you have any concerns.
Thanks!
--nab
Nicholas Bellinger (4):
vhost: Add vhost_scsi specific defines
tcm_vhost: Initial merge for vhost level target fabric driver
virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN
scanning
virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
Stefan Hajnoczi (2):
vhost: Separate vhost-net features from vhost features
vhost: make vhost work queue visible
drivers/scsi/virtio_scsi.c | 20 +-
drivers/vhost/Kconfig | 6 +
drivers/vhost/Makefile | 1 +
drivers/vhost/net.c | 4 +-
drivers/vhost/tcm_vhost.c | 1592 ++++++++++++++++++++++++++++++++++++++++++++
drivers/vhost/tcm_vhost.h | 70 ++
drivers/vhost/vhost.c | 5 +-
drivers/vhost/vhost.h | 6 +-
drivers/virtio/virtio.c | 5 +-
include/linux/vhost.h | 9 +
include/linux/virtio.h | 1 +
11 files changed, 1708 insertions(+), 11 deletions(-)
create mode 100644 drivers/vhost/tcm_vhost.c
create mode 100644 drivers/vhost/tcm_vhost.h
--
1.7.2.5
^ permalink raw reply
* [PATCH 1/6] vhost: Separate vhost-net features from vhost features
From: Nicholas A. Bellinger @ 2012-07-04 4:24 UTC (permalink / raw)
To: target-devel
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
Christoph Hellwig
In-Reply-To: <1341375846-27882-1-git-send-email-nab@linux-iscsi.org>
From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
In order for other vhost devices to use the VHOST_FEATURES bits the
vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
constant.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
---
drivers/vhost/net.c | 4 ++--
drivers/vhost/vhost.h | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f82a739..072cbba 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
return -EFAULT;
return vhost_net_set_backend(n, backend.index, backend.fd);
case VHOST_GET_FEATURES:
- features = VHOST_FEATURES;
+ features = VHOST_NET_FEATURES;
if (copy_to_user(featurep, &features, sizeof features))
return -EFAULT;
return 0;
case VHOST_SET_FEATURES:
if (copy_from_user(&features, featurep, sizeof features))
return -EFAULT;
- if (features & ~VHOST_FEATURES)
+ if (features & ~VHOST_NET_FEATURES)
return -EOPNOTSUPP;
return vhost_net_set_features(n, features);
case VHOST_RESET_OWNER:
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8de1fd5..07b9763 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -201,7 +201,8 @@ enum {
VHOST_FEATURES = (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
(1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
(1ULL << VIRTIO_RING_F_EVENT_IDX) |
- (1ULL << VHOST_F_LOG_ALL) |
+ (1ULL << VHOST_F_LOG_ALL),
+ VHOST_NET_FEATURES = VHOST_FEATURES |
(1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
(1ULL << VIRTIO_NET_F_MRG_RXBUF),
};
--
1.7.2.5
^ permalink raw reply related
* [PATCH 2/6] vhost: make vhost work queue visible
From: Nicholas A. Bellinger @ 2012-07-04 4:24 UTC (permalink / raw)
To: target-devel
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
Christoph Hellwig
In-Reply-To: <1341375846-27882-1-git-send-email-nab@linux-iscsi.org>
From: Stefan Hajnoczi <stefanha@gmail.com>
The vhost work queue allows processing to be done in vhost worker thread
context, which uses the owner process mm. Access to the vring and guest
memory is typically only possible from vhost worker context so it is
useful to allow work to be queued directly by users.
Currently vhost_net only uses the poll wrappers which do not expose the
work queue functions. However, for tcm_vhost (vhost_scsi) it will be
necessary to queue custom work.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
drivers/vhost/vhost.c | 5 ++---
drivers/vhost/vhost.h | 3 +++
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 94dbd25..1aab08b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -64,7 +64,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync,
return 0;
}
-static void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
{
INIT_LIST_HEAD(&work->node);
work->fn = fn;
@@ -137,8 +137,7 @@ void vhost_poll_flush(struct vhost_poll *poll)
vhost_work_flush(poll->dev, &poll->work);
}
-static inline void vhost_work_queue(struct vhost_dev *dev,
- struct vhost_work *work)
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
{
unsigned long flags;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 07b9763..1125af3 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -43,6 +43,9 @@ struct vhost_poll {
struct vhost_dev *dev;
};
+void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
+void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
+
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
unsigned long mask, struct vhost_dev *dev);
void vhost_poll_start(struct vhost_poll *poll, struct file *file);
--
1.7.2.5
^ permalink raw reply related
* [PATCH 3/6] vhost: Add vhost_scsi specific defines
From: Nicholas A. Bellinger @ 2012-07-04 4:24 UTC (permalink / raw)
To: target-devel
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
Nicholas Bellinger, Christoph Hellwig
In-Reply-To: <1341375846-27882-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@risingtidesystems.com>
This patch adds the initial vhost_scsi_ioctl() callers for VHOST_SCSI_SET_ENDPOINT
and VHOST_SCSI_CLEAR_ENDPOINT respectively, and also adds struct vhost_vring_target
that is used by tcm_vhost code when locating target ports during qemu setup.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
---
include/linux/vhost.h | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/include/linux/vhost.h b/include/linux/vhost.h
index e847f1e..33b313b 100644
--- a/include/linux/vhost.h
+++ b/include/linux/vhost.h
@@ -24,7 +24,11 @@ struct vhost_vring_state {
struct vhost_vring_file {
unsigned int index;
int fd; /* Pass -1 to unbind from file. */
+};
+struct vhost_vring_target {
+ unsigned char vhost_wwpn[224];
+ unsigned short vhost_tpgt;
};
struct vhost_vring_addr {
@@ -121,6 +125,11 @@ struct vhost_memory {
* device. This can be used to stop the ring (e.g. for migration). */
#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
+/* VHOST_SCSI specific defines */
+
+#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct vhost_vring_target)
+#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct vhost_vring_target)
+
/* Feature bits */
/* Log all write descriptors. Can be changed while device is active. */
#define VHOST_F_LOG_ALL 26
--
1.7.2.5
^ permalink raw reply related
* [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
From: Nicholas A. Bellinger @ 2012-07-04 4:24 UTC (permalink / raw)
To: target-devel
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
Christoph Hellwig
In-Reply-To: <1341375846-27882-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@linux-iscsi.org>
This patch adds the initial code for tcm_vhost, a Vhost level TCM
fabric driver for virtio SCSI initiators into KVM guest.
This code is currently up and running on v3.5-rc2 host+guest along with
the virtio-scsi vdev->scan() patch to allow a proper scsi_scan_host() to
occur once the tcm_vhost nexus has been established by the paravirtualized
virtio-scsi client.
(nab: Merge into single source + header file, and move to drivers/vhost/)
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
drivers/vhost/Kconfig | 6 +
drivers/vhost/Makefile | 1 +
drivers/vhost/tcm_vhost.c | 1592 +++++++++++++++++++++++++++++++++++++++++++++
drivers/vhost/tcm_vhost.h | 70 ++
4 files changed, 1669 insertions(+), 0 deletions(-)
create mode 100644 drivers/vhost/tcm_vhost.c
create mode 100644 drivers/vhost/tcm_vhost.h
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index e4e2fd1..a8642e2 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -9,3 +9,9 @@ config VHOST_NET
To compile this driver as a module, choose M here: the module will
be called vhost_net.
+config TCM_VHOST
+ tristate "TCM_VHOST fabric module (EXPERIMENTAL)"
+ depends on TARGET_CORE && EVENTFD && EXPERINETAL && m
+ default n
+ ---help---
+ Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 72dd020..b10c7b1 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -1,2 +1,3 @@
obj-$(CONFIG_VHOST_NET) += vhost_net.o
+obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
vhost_net-y := vhost.o net.o
diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c
new file mode 100644
index 0000000..cd86633
--- /dev/null
+++ b/drivers/vhost/tcm_vhost.c
@@ -0,0 +1,1592 @@
+/*******************************************************************************
+ * Vhost kernel TCM fabric driver for virtio SCSI initiators
+ *
+ * (C) Copyright 2010-2012 RisingTide Systems LLC.
+ * (C) Copyright 2010-2012 IBM Corp.
+ *
+ * Licensed to the Linux Foundation under the General Public License (GPL) version 2.
+ *
+ * Authors: Nicholas A. Bellinger <nab@risingtidesystems.com>
+ * Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ ****************************************************************************/
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <generated/utsrelease.h>
+#include <linux/utsname.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/kthread.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/configfs.h>
+#include <linux/ctype.h>
+#include <linux/compat.h>
+#include <linux/eventfd.h>
+#include <linux/vhost.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <asm/unaligned.h>
+#include <scsi/scsi.h>
+#include <scsi/scsi_tcq.h>
+#include <target/target_core_base.h>
+#include <target/target_core_fabric.h>
+#include <target/target_core_fabric_configfs.h>
+#include <target/target_core_configfs.h>
+#include <target/configfs_macros.h>
+#include <linux/vhost.h>
+#include <linux/virtio_net.h> /* TODO vhost.h currently depends on this */
+#include <linux/virtio_scsi.h>
+
+#include "vhost.c"
+#include "vhost.h"
+#include "tcm_vhost.h"
+
+struct vhost_scsi {
+ atomic_t vhost_ref_cnt;
+ struct tcm_vhost_tpg *vs_tpg;
+ struct vhost_dev dev;
+ struct vhost_virtqueue vqs[3];
+
+ struct vhost_work vs_completion_work; /* cmd completion work item */
+ struct list_head vs_completion_list; /* cmd completion queue */
+ spinlock_t vs_completion_lock; /* protects s_completion_list */
+};
+
+/* Local pointer to allocated TCM configfs fabric module */
+struct target_fabric_configfs *tcm_vhost_fabric_configfs;
+
+/* Global spinlock to protect tcm_vhost TPG list for vhost IOCTL access */
+DEFINE_MUTEX(tcm_vhost_mutex);
+LIST_HEAD(tcm_vhost_list);
+
+static int tcm_vhost_check_true(struct se_portal_group *se_tpg)
+{
+ return 1;
+}
+
+static int tcm_vhost_check_false(struct se_portal_group *se_tpg)
+{
+ return 0;
+}
+
+static char *tcm_vhost_get_fabric_name(void)
+{
+ return "vhost";
+}
+
+static u8 tcm_vhost_get_fabric_proto_ident(struct se_portal_group *se_tpg)
+{
+ struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ struct tcm_vhost_tport *tport = tpg->tport;
+
+ switch (tport->tport_proto_id) {
+ case SCSI_PROTOCOL_SAS:
+ return sas_get_fabric_proto_ident(se_tpg);
+ case SCSI_PROTOCOL_FCP:
+ return fc_get_fabric_proto_ident(se_tpg);
+ case SCSI_PROTOCOL_ISCSI:
+ return iscsi_get_fabric_proto_ident(se_tpg);
+ default:
+ pr_err("Unknown tport_proto_id: 0x%02x, using"
+ " SAS emulation\n", tport->tport_proto_id);
+ break;
+ }
+
+ return sas_get_fabric_proto_ident(se_tpg);
+}
+
+static char *tcm_vhost_get_fabric_wwn(struct se_portal_group *se_tpg)
+{
+ struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ struct tcm_vhost_tport *tport = tpg->tport;
+
+ return &tport->tport_name[0];
+}
+
+u16 tcm_vhost_get_tag(struct se_portal_group *se_tpg)
+{
+ struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ return tpg->tport_tpgt;
+}
+
+static u32 tcm_vhost_get_default_depth(struct se_portal_group *se_tpg)
+{
+ return 1;
+}
+
+static u32 tcm_vhost_get_pr_transport_id(
+ struct se_portal_group *se_tpg,
+ struct se_node_acl *se_nacl,
+ struct t10_pr_registration *pr_reg,
+ int *format_code,
+ unsigned char *buf)
+{
+ struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ struct tcm_vhost_tport *tport = tpg->tport;
+
+ switch (tport->tport_proto_id) {
+ case SCSI_PROTOCOL_SAS:
+ return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+ format_code, buf);
+ case SCSI_PROTOCOL_FCP:
+ return fc_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+ format_code, buf);
+ case SCSI_PROTOCOL_ISCSI:
+ return iscsi_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+ format_code, buf);
+ default:
+ pr_err("Unknown tport_proto_id: 0x%02x, using"
+ " SAS emulation\n", tport->tport_proto_id);
+ break;
+ }
+
+ return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg,
+ format_code, buf);
+}
+
+static u32 tcm_vhost_get_pr_transport_id_len(
+ struct se_portal_group *se_tpg,
+ struct se_node_acl *se_nacl,
+ struct t10_pr_registration *pr_reg,
+ int *format_code)
+{
+ struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ struct tcm_vhost_tport *tport = tpg->tport;
+
+ switch (tport->tport_proto_id) {
+ case SCSI_PROTOCOL_SAS:
+ return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+ format_code);
+ case SCSI_PROTOCOL_FCP:
+ return fc_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+ format_code);
+ case SCSI_PROTOCOL_ISCSI:
+ return iscsi_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+ format_code);
+ default:
+ pr_err("Unknown tport_proto_id: 0x%02x, using"
+ " SAS emulation\n", tport->tport_proto_id);
+ break;
+ }
+
+ return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg,
+ format_code);
+}
+
+static char *tcm_vhost_parse_pr_out_transport_id(
+ struct se_portal_group *se_tpg,
+ const char *buf,
+ u32 *out_tid_len,
+ char **port_nexus_ptr)
+{
+ struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ struct tcm_vhost_tport *tport = tpg->tport;
+
+ switch (tport->tport_proto_id) {
+ case SCSI_PROTOCOL_SAS:
+ return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+ port_nexus_ptr);
+ case SCSI_PROTOCOL_FCP:
+ return fc_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+ port_nexus_ptr);
+ case SCSI_PROTOCOL_ISCSI:
+ return iscsi_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+ port_nexus_ptr);
+ default:
+ pr_err("Unknown tport_proto_id: 0x%02x, using"
+ " SAS emulation\n", tport->tport_proto_id);
+ break;
+ }
+
+ return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len,
+ port_nexus_ptr);
+}
+
+static struct se_node_acl *tcm_vhost_alloc_fabric_acl(struct se_portal_group *se_tpg)
+{
+ struct tcm_vhost_nacl *nacl;
+
+ nacl = kzalloc(sizeof(struct tcm_vhost_nacl), GFP_KERNEL);
+ if (!nacl) {
+ pr_err("Unable to alocate struct tcm_vhost_nacl\n");
+ return NULL;
+ }
+
+ return &nacl->se_node_acl;
+}
+
+static void tcm_vhost_release_fabric_acl(
+ struct se_portal_group *se_tpg,
+ struct se_node_acl *se_nacl)
+{
+ struct tcm_vhost_nacl *nacl = container_of(se_nacl,
+ struct tcm_vhost_nacl, se_node_acl);
+ kfree(nacl);
+}
+
+static u32 tcm_vhost_tpg_get_inst_index(struct se_portal_group *se_tpg)
+{
+ return 1;
+}
+
+/*
+ * Called by struct target_core_fabric_ops->new_cmd_map()
+ *
+ * Always called in process context. A non zero return value
+ * here will signal to handle an exception based on the return code.
+ */
+static int tcm_vhost_new_cmd_map(struct se_cmd *se_cmd)
+{
+ struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+ struct tcm_vhost_cmd, tvc_se_cmd);
+ struct scatterlist *sg_ptr, *sg_bidi_ptr = NULL;
+ u32 sg_no_bidi = 0;
+ int ret;
+ /*
+ * Allocate the necessary tasks to complete the received CDB+data
+ */
+ ret = target_setup_cmd_from_cdb(se_cmd, tv_cmd->tvc_cdb);
+ if (ret != 0)
+ return ret;
+ /*
+ * Setup the struct scatterlist memory from the received
+ * struct tcm_vhost_cmd..
+ */
+ if (tv_cmd->tvc_sgl_count) {
+ sg_ptr = tv_cmd->tvc_sgl;
+ /*
+ * For BIDI commands, pass in the extra READ buffer
+ * to transport_generic_map_mem_to_cmd() below..
+ */
+/* FIXME: Fix BIDI operation in tcm_vhost_new_cmd_map() */
+#if 0
+ if (se_cmd->se_cmd_flags & SCF_BIDI) {
+ mem_bidi_ptr = NULL;
+ sg_no_bidi = 0;
+ }
+#endif
+ } else {
+ /*
+ * Used for DMA_NONE
+ */
+ sg_ptr = NULL;
+ }
+
+ /* Tell the core about our preallocated memory */
+ return transport_generic_map_mem_to_cmd(se_cmd, sg_ptr,
+ tv_cmd->tvc_sgl_count, sg_bidi_ptr,
+ sg_no_bidi);
+}
+
+static void tcm_vhost_release_cmd(struct se_cmd *se_cmd)
+{
+ return;
+}
+
+static int tcm_vhost_shutdown_session(struct se_session *se_sess)
+{
+ return 0;
+}
+
+static void tcm_vhost_close_session(struct se_session *se_sess)
+{
+ return;
+}
+
+static u32 tcm_vhost_sess_get_index(struct se_session *se_sess)
+{
+ return 0;
+}
+
+static int tcm_vhost_write_pending(struct se_cmd *se_cmd)
+{
+ /* Go ahead and process the write immediately */
+ transport_generic_process_write(se_cmd);
+ return 0;
+}
+
+static int tcm_vhost_write_pending_status(struct se_cmd *se_cmd)
+{
+ return 0;
+}
+
+static void tcm_vhost_set_default_node_attrs(struct se_node_acl *nacl)
+{
+ return;
+}
+
+static u32 tcm_vhost_get_task_tag(struct se_cmd *se_cmd)
+{
+ return 0;
+}
+
+static int tcm_vhost_get_cmd_state(struct se_cmd *se_cmd)
+{
+ return 0;
+}
+
+static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *);
+
+static int tcm_vhost_queue_data_in(struct se_cmd *se_cmd)
+{
+ struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+ struct tcm_vhost_cmd, tvc_se_cmd);
+ vhost_scsi_complete_cmd(tv_cmd);
+ return 0;
+}
+
+static int tcm_vhost_queue_status(struct se_cmd *se_cmd)
+{
+ struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd,
+ struct tcm_vhost_cmd, tvc_se_cmd);
+ vhost_scsi_complete_cmd(tv_cmd);
+ return 0;
+}
+
+static int tcm_vhost_queue_tm_rsp(struct se_cmd *se_cmd)
+{
+ return 0;
+}
+
+static u16 tcm_vhost_set_fabric_sense_len(struct se_cmd *se_cmd, u32 sense_length)
+{
+ return 0;
+}
+
+static u16 tcm_vhost_get_fabric_sense_len(void)
+{
+ return 0;
+}
+
+static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd)
+{
+ struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
+
+ /* TODO locking against target/backend threads? */
+ transport_generic_free_cmd(se_cmd, 1);
+
+ if (tv_cmd->tvc_sgl_count) {
+ u32 i;
+ for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
+ put_page(sg_page(&tv_cmd->tvc_sgl[i]));
+ }
+
+ kfree(tv_cmd);
+}
+
+/* Dequeue a command from the completion list */
+static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion(struct vhost_scsi *vs)
+{
+ struct tcm_vhost_cmd *tv_cmd = NULL;
+
+ spin_lock_bh(&vs->vs_completion_lock);
+ if (list_empty(&vs->vs_completion_list)) {
+ spin_unlock_bh(&vs->vs_completion_lock);
+ return NULL;
+ }
+
+ list_for_each_entry(tv_cmd, &vs->vs_completion_list,
+ tvc_completion_list) {
+ list_del(&tv_cmd->tvc_completion_list);
+ break;
+ }
+ spin_unlock_bh(&vs->vs_completion_lock);
+ return tv_cmd;
+}
+
+/* Fill in status and signal that we are done processing this command
+ *
+ * This is scheduled in the vhost work queue so we are called with the owner
+ * process mm and can access the vring.
+ */
+static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
+{
+ struct vhost_scsi *vs = container_of(work, struct vhost_scsi,
+ vs_completion_work);
+ struct tcm_vhost_cmd *tv_cmd;
+
+ while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs)) != NULL) {
+ struct virtio_scsi_cmd_resp v_rsp;
+ struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd;
+ int ret;
+
+ pr_debug("%s tv_cmd %p resid %u status %#02x\n", __func__,
+ tv_cmd, se_cmd->residual_count, se_cmd->scsi_status);
+
+ memset(&v_rsp, 0, sizeof(v_rsp));
+ v_rsp.resid = se_cmd->residual_count;
+ /* TODO is status_qualifier field needed? */
+ v_rsp.status = se_cmd->scsi_status;
+ v_rsp.sense_len = se_cmd->scsi_sense_length;
+ memcpy(v_rsp.sense, tv_cmd->tvc_sense_buf,
+ v_rsp.sense_len);
+ ret = copy_to_user(tv_cmd->tvc_resp, &v_rsp, sizeof(v_rsp));
+ if (likely(ret == 0))
+ vhost_add_used(&vs->vqs[2], tv_cmd->tvc_vq_desc, 0);
+ else
+ pr_err("Faulted on virtio_scsi_cmd_resp\n");
+
+ vhost_scsi_free_cmd(tv_cmd);
+ }
+
+ vhost_signal(&vs->dev, &vs->vqs[2]);
+}
+
+static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *tv_cmd)
+{
+ struct vhost_scsi *vs = tv_cmd->tvc_vhost;
+
+ pr_debug("%s tv_cmd %p\n", __func__, tv_cmd);
+
+ spin_lock_bh(&vs->vs_completion_lock);
+ list_add_tail(&tv_cmd->tvc_completion_list, &vs->vs_completion_list);
+ spin_unlock_bh(&vs->vs_completion_lock);
+
+ vhost_work_queue(&vs->dev, &vs->vs_completion_work);
+}
+
+static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd(
+ struct tcm_vhost_tpg *tv_tpg,
+ struct virtio_scsi_cmd_req *v_req,
+ u32 exp_data_len,
+ int data_direction)
+{
+ struct tcm_vhost_cmd *tv_cmd;
+ struct tcm_vhost_nexus *tv_nexus;
+ struct se_portal_group *se_tpg = &tv_tpg->se_tpg;
+ struct se_session *se_sess;
+ struct se_cmd *se_cmd;
+ int sam_task_attr;
+
+ tv_nexus = tv_tpg->tpg_nexus;
+ if (!tv_nexus) {
+ pr_err("Unable to locate active struct tcm_vhost_nexus\n");
+ return ERR_PTR(-EIO);
+ }
+ se_sess = tv_nexus->tvn_se_sess;
+
+ tv_cmd = kzalloc(sizeof(struct tcm_vhost_cmd), GFP_ATOMIC);
+ if (!tv_cmd) {
+ pr_err("Unable to allocate struct tcm_vhost_cmd\n");
+ return ERR_PTR(-ENOMEM);
+ }
+ INIT_LIST_HEAD(&tv_cmd->tvc_completion_list);
+ tv_cmd->tvc_tag = v_req->tag;
+
+ se_cmd = &tv_cmd->tvc_se_cmd;
+ /*
+ * Locate the SAM Task Attr from virtio_scsi_cmd_req
+ */
+ sam_task_attr = v_req->task_attr;
+ /*
+ * Initialize struct se_cmd descriptor from target_core_mod infrastructure
+ */
+ transport_init_se_cmd(se_cmd, se_tpg->se_tpg_tfo, se_sess, exp_data_len,
+ data_direction, sam_task_attr,
+ &tv_cmd->tvc_sense_buf[0]);
+
+#if 0 /* FIXME: vhost_scsi_allocate_cmd() BIDI operation */
+ if (bidi)
+ se_cmd->se_cmd_flags |= SCF_BIDI;
+#endif
+ /*
+ * From here the rest of the se_cmd will be setup and dispatched
+ * via tcm_vhost_new_cmd_map() from TCM backend thread context
+ * after transport_generic_handle_cdb_map() has been called from
+ * vhost_scsi_handle_vq() below..
+ */
+ return tv_cmd;
+}
+
+/*
+ * Map a user memory range into a scatterlist
+ *
+ * Returns the number of scatterlist entries used or -errno on error.
+ */
+static int vhost_scsi_map_to_sgl(struct scatterlist *sgl,
+ unsigned int sgl_count,
+ void __user *ptr, size_t len, int write)
+{
+ struct scatterlist *sg = sgl;
+ unsigned int npages = 0;
+ int ret;
+
+ while (len > 0) {
+ struct page *page;
+ unsigned int offset = (uintptr_t)ptr & ~PAGE_MASK;
+ unsigned int nbytes = min(PAGE_SIZE - offset, len);
+
+ if (npages == sgl_count) {
+ ret = -ENOBUFS;
+ goto err;
+ }
+
+ ret = get_user_pages_fast((unsigned long)ptr, 1, write, &page);
+ BUG_ON(ret == 0); /* we should either get our page or fail */
+ if (ret < 0)
+ goto err;
+
+ sg_set_page(sg, page, nbytes, offset);
+ ptr += nbytes;
+ len -= nbytes;
+ sg++;
+ npages++;
+ }
+ return npages;
+
+err:
+ /* Put pages that we hold */
+ for (sg = sgl; sg != &sgl[npages]; sg++)
+ put_page(sg_page(sg));
+ return ret;
+}
+
+static int vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *tv_cmd,
+ struct iovec *iov, unsigned int niov,
+ int write)
+{
+ int ret;
+ unsigned int i;
+ u32 sgl_count;
+ struct scatterlist *sg;
+
+ /*
+ * Find out how long sglist needs to be
+ */
+ sgl_count = 0;
+ for (i = 0; i < niov; i++) {
+ sgl_count += (((uintptr_t)iov[i].iov_base + iov[i].iov_len +
+ PAGE_SIZE - 1) >> PAGE_SHIFT) -
+ ((uintptr_t)iov[i].iov_base >> PAGE_SHIFT);
+ }
+ /* TODO overflow checking */
+
+ sg = kmalloc(sizeof(tv_cmd->tvc_sgl[0]) * sgl_count, GFP_ATOMIC);
+ if (!sg)
+ return -ENOMEM;
+ pr_debug("%s sg %p sgl_count %u is_err %ld\n", __func__,
+ sg, sgl_count, IS_ERR(sg));
+ sg_init_table(sg, sgl_count);
+
+ tv_cmd->tvc_sgl = sg;
+ tv_cmd->tvc_sgl_count = sgl_count;
+
+ pr_debug("Mapping %u iovecs for %u pages\n", niov, sgl_count);
+ for (i = 0; i < niov; i++) {
+ ret = vhost_scsi_map_to_sgl(sg, sgl_count, iov[i].iov_base,
+ iov[i].iov_len, write);
+ if (ret < 0) {
+ for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
+ put_page(sg_page(&tv_cmd->tvc_sgl[i]));
+ kfree(tv_cmd->tvc_sgl);
+ tv_cmd->tvc_sgl = NULL;
+ tv_cmd->tvc_sgl_count = 0;
+ return ret;
+ }
+
+ sg += ret;
+ sgl_count -= ret;
+ }
+ return 0;
+}
+
+static void vhost_scsi_handle_vq(struct vhost_scsi *vs)
+{
+ struct vhost_virtqueue *vq = &vs->vqs[2];
+ struct virtio_scsi_cmd_req v_req;
+ struct tcm_vhost_tpg *tv_tpg;
+ struct tcm_vhost_cmd *tv_cmd;
+ u32 exp_data_len, data_first, data_num, data_direction;
+ unsigned out, in, i;
+ int head, ret, lun;
+
+ /* Must use ioctl VHOST_SCSI_SET_ENDPOINT */
+ tv_tpg = vs->vs_tpg;
+ if (unlikely(!tv_tpg)) {
+ pr_err("%s endpoint not set\n", __func__);
+ return;
+ }
+
+ mutex_lock(&vq->mutex);
+ vhost_disable_notify(&vs->dev, vq);
+
+ for (;;) {
+ head = vhost_get_vq_desc(&vs->dev, vq, vq->iov,
+ ARRAY_SIZE(vq->iov), &out, &in,
+ NULL, NULL);
+ pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n", head, out, in);
+ /* On error, stop handling until the next kick. */
+ if (unlikely(head < 0))
+ break;
+ /* Nothing new? Wait for eventfd to tell us they refilled. */
+ if (head == vq->num) {
+ if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
+ vhost_disable_notify(&vs->dev, vq);
+ continue;
+ }
+ break;
+ }
+
+/* FIXME: BIDI operation */
+ if (out == 1 && in == 1) {
+ data_direction = DMA_NONE;
+ data_first = 0;
+ data_num = 0;
+ } else if (out == 1 && in > 1) {
+ data_direction = DMA_FROM_DEVICE;
+ data_first = out + 1;
+ data_num = in - 1;
+ } else if (out > 1 && in == 1) {
+ data_direction = DMA_TO_DEVICE;
+ data_first = 1;
+ data_num = out - 1;
+ } else {
+ pr_err("Invalid buffer layout out: %u in: %u\n", out, in);
+ break;
+ }
+
+ /*
+ * Check for a sane resp buffer so we can report errors to
+ * the guest.
+ */
+ if (unlikely(vq->iov[out].iov_len !=
+ sizeof(struct virtio_scsi_cmd_resp))) {
+ pr_err("Expecting virtio_scsi_cmd_resp, got %zu bytes\n",
+ vq->iov[out].iov_len);
+ break;
+ }
+
+ if (unlikely(vq->iov[0].iov_len != sizeof(v_req))) {
+ pr_err("Expecting virtio_scsi_cmd_req, got %zu bytes\n",
+ vq->iov[0].iov_len);
+ break;
+ }
+ pr_debug("Calling __copy_from_user: vq->iov[0].iov_base: %p, len: %lu\n",
+ vq->iov[0].iov_base, sizeof(v_req));
+ ret = __copy_from_user(&v_req, vq->iov[0].iov_base, sizeof(v_req));
+ if (unlikely(ret)) {
+ pr_err("Faulted on virtio_scsi_cmd_req\n");
+ break;
+ }
+
+ exp_data_len = 0;
+ for (i = 0; i < data_num; i++) {
+ exp_data_len += vq->iov[data_first + i].iov_len;
+ }
+
+ tv_cmd = vhost_scsi_allocate_cmd(tv_tpg, &v_req,
+ exp_data_len, data_direction);
+ if (IS_ERR(tv_cmd)) {
+ pr_err("vhost_scsi_allocate_cmd failed %ld\n", PTR_ERR(tv_cmd));
+ break;
+ }
+ pr_debug("Allocated tv_cmd: %p exp_data_len: %d, data_direction: %d\n",
+ tv_cmd, exp_data_len, data_direction);
+
+ tv_cmd->tvc_vhost = vs;
+
+ if (unlikely(vq->iov[out].iov_len !=
+ sizeof(struct virtio_scsi_cmd_resp))) {
+ pr_err("Expecting virtio_scsi_cmd_resp, "
+ " got %zu bytes, out: %d, in: %d\n", vq->iov[out].iov_len, out, in);
+ break;
+ }
+
+ tv_cmd->tvc_resp = vq->iov[out].iov_base;
+
+ /*
+ * Copy in the recieved CDB descriptor into tv_cmd->tvc_cdb
+ * that will be used by tcm_vhost_new_cmd_map() and down into
+ * target_setup_cmd_from_cdb()
+ */
+ memcpy(tv_cmd->tvc_cdb, v_req.cdb, TCM_VHOST_MAX_CDB_SIZE);
+ /*
+ * Check that the recieved CDB size does not exceeded our
+ * hardcoded max for tcm_vhost
+ */
+ /* TODO what if cdb was too small for varlen cdb header? */
+ if (unlikely(scsi_command_size(tv_cmd->tvc_cdb) > TCM_VHOST_MAX_CDB_SIZE)) {
+ pr_err("Received SCSI CDB with command_size: %d that exceeds"
+ " SCSI_MAX_VARLEN_CDB_SIZE: %d\n",
+ scsi_command_size(tv_cmd->tvc_cdb), TCM_VHOST_MAX_CDB_SIZE);
+ break; /* TODO */
+ }
+ lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
+
+ pr_debug("vhost_scsi got command opcode: %#02x, lun: %d\n",
+ tv_cmd->tvc_cdb[0], lun);
+
+ if (data_direction != DMA_NONE) {
+ ret = vhost_scsi_map_iov_to_sgl(tv_cmd, &vq->iov[data_first],
+ data_num, data_direction == DMA_TO_DEVICE);
+ if (unlikely(ret)) {
+ pr_err("Failed to map iov to sgl\n");
+ break; /* TODO */
+ }
+ }
+
+ /*
+ * Save the descriptor from vhost_get_vq_desc() to be used to
+ * complete the virtio-scsi request in TCM callback context via
+ * tcm_vhost_queue_data_in() and tcm_vhost_queue_status()
+ */
+ tv_cmd->tvc_vq_desc = head;
+ /*
+ * Locate the struct se_lun pointer based on v_req->lun, and
+ * attach it to struct se_cmd
+ */
+ if (transport_lookup_cmd_lun(&tv_cmd->tvc_se_cmd, lun) < 0) {
+ pr_err("Failed to look up lun: %d\n", lun);
+ /* NON_EXISTENT_LUN */
+ transport_send_check_condition_and_sense(&tv_cmd->tvc_se_cmd,
+ tv_cmd->tvc_se_cmd.scsi_sense_reason, 0);
+ continue;
+ }
+ /*
+ * Now queue up the newly allocated se_cmd to be processed
+ * within TCM thread context to finish the setup and dispatched
+ * into a TCM backend struct se_device.
+ */
+ transport_generic_handle_cdb_map(&tv_cmd->tvc_se_cmd);
+ }
+
+ mutex_unlock(&vq->mutex);
+}
+
+static void vhost_scsi_ctl_handle_kick(struct vhost_work *work)
+{
+ pr_err("%s: The handling func for control queue.\n", __func__);
+}
+
+static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
+{
+ pr_err("%s: The handling func for event queue.\n", __func__);
+}
+
+static void vhost_scsi_handle_kick(struct vhost_work *work)
+{
+ struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue,
+ poll.work);
+ struct vhost_scsi *vs = container_of(vq->dev, struct vhost_scsi, dev);
+
+ vhost_scsi_handle_vq(vs);
+}
+
+/*
+ * Called from vhost_scsi_ioctl() context to walk the list of available tcm_vhost_tpg
+ * with an active struct tcm_vhost_nexus
+ */
+static int vhost_scsi_set_endpoint(
+ struct vhost_scsi *vs,
+ struct vhost_vring_target *t)
+{
+ struct tcm_vhost_tport *tv_tport;
+ struct tcm_vhost_tpg *tv_tpg;
+ int index;
+
+ mutex_lock(&vs->dev.mutex);
+ /* Verify that ring has been setup correctly. */
+ for (index = 0; index < vs->dev.nvqs; ++index) {
+ /* Verify that ring has been setup correctly. */
+ if (!vhost_vq_access_ok(&vs->vqs[index])) {
+ mutex_unlock(&vs->dev.mutex);
+ return -EFAULT;
+ }
+ }
+
+ if (vs->vs_tpg) {
+ mutex_unlock(&vs->dev.mutex);
+ return -EEXIST;
+ }
+ mutex_unlock(&vs->dev.mutex);
+
+ mutex_lock(&tcm_vhost_mutex);
+ list_for_each_entry(tv_tpg, &tcm_vhost_list, tv_tpg_list) {
+ mutex_lock(&tv_tpg->tv_tpg_mutex);
+ if (!tv_tpg->tpg_nexus) {
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ continue;
+ }
+ if (atomic_read(&tv_tpg->tv_tpg_vhost_count)) {
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ continue;
+ }
+ tv_tport = tv_tpg->tport;
+
+ if (!strcmp(tv_tport->tport_name, t->vhost_wwpn) &&
+ (tv_tpg->tport_tpgt == t->vhost_tpgt)) {
+ atomic_inc(&tv_tpg->tv_tpg_vhost_count);
+ smp_mb__after_atomic_inc();
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ mutex_unlock(&tcm_vhost_mutex);
+
+ mutex_lock(&vs->dev.mutex);
+ vs->vs_tpg = tv_tpg;
+ atomic_inc(&vs->vhost_ref_cnt);
+ smp_mb__after_atomic_inc();
+ mutex_unlock(&vs->dev.mutex);
+ return 0;
+ }
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ }
+ mutex_unlock(&tcm_vhost_mutex);
+ return -EINVAL;
+}
+
+static int vhost_scsi_clear_endpoint(
+ struct vhost_scsi *vs,
+ struct vhost_vring_target *t)
+{
+ struct tcm_vhost_tport *tv_tport;
+ struct tcm_vhost_tpg *tv_tpg;
+ int index;
+
+ mutex_lock(&vs->dev.mutex);
+ /* Verify that ring has been setup correctly. */
+ for (index = 0; index < vs->dev.nvqs; ++index) {
+ if (!vhost_vq_access_ok(&vs->vqs[index])) {
+ mutex_unlock(&vs->dev.mutex);
+ return -EFAULT;
+ }
+ }
+
+ if (!vs->vs_tpg) {
+ mutex_unlock(&vs->dev.mutex);
+ return -ENODEV;
+ }
+ tv_tpg = vs->vs_tpg;
+ tv_tport = tv_tpg->tport;
+
+ if (strcmp(tv_tport->tport_name, t->vhost_wwpn) ||
+ (tv_tpg->tport_tpgt != t->vhost_tpgt)) {
+ mutex_unlock(&vs->dev.mutex);
+ pr_warn("tv_tport->tport_name: %s, tv_tpg->tport_tpgt: %hu"
+ " does not match t->vhost_wwpn: %s, t->vhost_tpgt: %hu\n",
+ tv_tport->tport_name, tv_tpg->tport_tpgt,
+ t->vhost_wwpn, t->vhost_tpgt);
+ return -EINVAL;
+ }
+ atomic_dec(&tv_tpg->tv_tpg_vhost_count);
+ vs->vs_tpg = NULL;
+ mutex_unlock(&vs->dev.mutex);
+
+ return 0;
+}
+
+static int vhost_scsi_open(struct inode *inode, struct file *f)
+{
+ struct vhost_scsi *s;
+ int r;
+
+ s = kzalloc(sizeof(*s), GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+
+ vhost_work_init(&s->vs_completion_work, vhost_scsi_complete_cmd_work);
+ INIT_LIST_HEAD(&s->vs_completion_list);
+ spin_lock_init(&s->vs_completion_lock);
+
+ s->vqs[0].handle_kick = vhost_scsi_ctl_handle_kick;
+ s->vqs[1].handle_kick = vhost_scsi_evt_handle_kick;
+ s->vqs[2].handle_kick = vhost_scsi_handle_kick;
+ r = vhost_dev_init(&s->dev, s->vqs, 3);
+ if (r < 0) {
+ kfree(s);
+ return r;
+ }
+
+ f->private_data = s;
+ return 0;
+}
+
+static int vhost_scsi_release(struct inode *inode, struct file *f)
+{
+ struct vhost_scsi *s = f->private_data;
+
+ if (s->vs_tpg && s->vs_tpg->tport) {
+ struct vhost_vring_target backend;
+ memcpy(backend.vhost_wwpn, s->vs_tpg->tport->tport_name, sizeof(backend.vhost_wwpn));
+ backend.vhost_tpgt = s->vs_tpg->tport_tpgt;
+ vhost_scsi_clear_endpoint(s, &backend);
+ }
+
+ vhost_dev_cleanup(&s->dev, false);
+ kfree(s);
+ return 0;
+}
+
+static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features)
+{
+ if (features & ~VHOST_FEATURES)
+ return -EOPNOTSUPP;
+
+ mutex_lock(&vs->dev.mutex);
+ if ((features & (1 << VHOST_F_LOG_ALL)) &&
+ !vhost_log_access_ok(&vs->dev)) {
+ mutex_unlock(&vs->dev.mutex);
+ return -EFAULT;
+ }
+ vs->dev.acked_features = features;
+ /* TODO possibly smp_wmb() and flush vqs */
+ mutex_unlock(&vs->dev.mutex);
+ return 0;
+}
+
+static long vhost_scsi_ioctl(struct file *f, unsigned int ioctl,
+ unsigned long arg)
+{
+ struct vhost_scsi *vs = f->private_data;
+ struct vhost_vring_target backend;
+ void __user *argp = (void __user *)arg;
+ u64 __user *featurep = argp;
+ u64 features;
+ int r;
+
+ switch (ioctl) {
+ case VHOST_SCSI_SET_ENDPOINT:
+ if (copy_from_user(&backend, argp, sizeof backend))
+ return -EFAULT;
+
+ return vhost_scsi_set_endpoint(vs, &backend);
+ case VHOST_SCSI_CLEAR_ENDPOINT:
+ if (copy_from_user(&backend, argp, sizeof backend))
+ return -EFAULT;
+
+ return vhost_scsi_clear_endpoint(vs, &backend);
+ case VHOST_GET_FEATURES:
+ features = VHOST_FEATURES;
+ if (copy_to_user(featurep, &features, sizeof features))
+ return -EFAULT;
+ return 0;
+ case VHOST_SET_FEATURES:
+ if (copy_from_user(&features, featurep, sizeof features))
+ return -EFAULT;
+ return vhost_scsi_set_features(vs, features);
+ default:
+ mutex_lock(&vs->dev.mutex);
+ r = vhost_dev_ioctl(&vs->dev, ioctl, arg);
+ mutex_unlock(&vs->dev.mutex);
+ return r;
+ }
+}
+
+static const struct file_operations vhost_scsi_fops = {
+ .owner = THIS_MODULE,
+ .release = vhost_scsi_release,
+ .unlocked_ioctl = vhost_scsi_ioctl,
+ /* TODO compat ioctl? */
+ .open = vhost_scsi_open,
+ .llseek = noop_llseek,
+};
+
+static struct miscdevice vhost_scsi_misc = {
+ MISC_DYNAMIC_MINOR,
+ "vhost-scsi",
+ &vhost_scsi_fops,
+};
+
+static int __init vhost_scsi_register(void)
+{
+ return misc_register(&vhost_scsi_misc);
+}
+
+static int vhost_scsi_deregister(void)
+{
+ return misc_deregister(&vhost_scsi_misc);
+}
+
+static char *tcm_vhost_dump_proto_id(struct tcm_vhost_tport *tport)
+{
+ switch (tport->tport_proto_id) {
+ case SCSI_PROTOCOL_SAS:
+ return "SAS";
+ case SCSI_PROTOCOL_FCP:
+ return "FCP";
+ case SCSI_PROTOCOL_ISCSI:
+ return "iSCSI";
+ default:
+ break;
+ }
+
+ return "Unknown";
+}
+
+static int tcm_vhost_port_link(
+ struct se_portal_group *se_tpg,
+ struct se_lun *lun)
+{
+ struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+
+ atomic_inc(&tv_tpg->tv_tpg_port_count);
+ smp_mb__after_atomic_inc();
+
+ return 0;
+}
+
+static void tcm_vhost_port_unlink(
+ struct se_portal_group *se_tpg,
+ struct se_lun *se_lun)
+{
+ struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+
+ atomic_dec(&tv_tpg->tv_tpg_port_count);
+ smp_mb__after_atomic_dec();
+}
+
+static struct se_node_acl *tcm_vhost_make_nodeacl(
+ struct se_portal_group *se_tpg,
+ struct config_group *group,
+ const char *name)
+{
+ struct se_node_acl *se_nacl, *se_nacl_new;
+ struct tcm_vhost_nacl *nacl;
+ u64 wwpn = 0;
+ u32 nexus_depth;
+
+ /* tcm_vhost_parse_wwn(name, &wwpn, 1) < 0)
+ return ERR_PTR(-EINVAL); */
+ se_nacl_new = tcm_vhost_alloc_fabric_acl(se_tpg);
+ if (!se_nacl_new)
+ return ERR_PTR(-ENOMEM);
+//#warning FIXME: Hardcoded nexus depth in tcm_vhost_make_nodeacl()
+ nexus_depth = 1;
+ /*
+ * se_nacl_new may be released by core_tpg_add_initiator_node_acl()
+ * when converting a NodeACL from demo mode -> explict
+ */
+ se_nacl = core_tpg_add_initiator_node_acl(se_tpg, se_nacl_new,
+ name, nexus_depth);
+ if (IS_ERR(se_nacl)) {
+ tcm_vhost_release_fabric_acl(se_tpg, se_nacl_new);
+ return se_nacl;
+ }
+ /*
+ * Locate our struct tcm_vhost_nacl and set the FC Nport WWPN
+ */
+ nacl = container_of(se_nacl, struct tcm_vhost_nacl, se_node_acl);
+ nacl->iport_wwpn = wwpn;
+ /* tcm_vhost_format_wwn(&nacl->iport_name[0], TCM_VHOST_NAMELEN, wwpn); */
+
+ return se_nacl;
+}
+
+static void tcm_vhost_drop_nodeacl(struct se_node_acl *se_acl)
+{
+ struct tcm_vhost_nacl *nacl = container_of(se_acl,
+ struct tcm_vhost_nacl, se_node_acl);
+ core_tpg_del_initiator_node_acl(se_acl->se_tpg, se_acl, 1);
+ kfree(nacl);
+}
+
+static int tcm_vhost_make_nexus(
+ struct tcm_vhost_tpg *tv_tpg,
+ const char *name)
+{
+ struct se_portal_group *se_tpg;
+ struct tcm_vhost_nexus *tv_nexus;
+
+ mutex_lock(&tv_tpg->tv_tpg_mutex);
+ if (tv_tpg->tpg_nexus) {
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ pr_debug("tv_tpg->tpg_nexus already exists\n");
+ return -EEXIST;
+ }
+ se_tpg = &tv_tpg->se_tpg;
+
+ tv_nexus = kzalloc(sizeof(struct tcm_vhost_nexus), GFP_KERNEL);
+ if (!tv_nexus) {
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ pr_err("Unable to allocate struct tcm_vhost_nexus\n");
+ return -ENOMEM;
+ }
+ /*
+ * Initialize the struct se_session pointer
+ */
+ tv_nexus->tvn_se_sess = transport_init_session();
+ if (IS_ERR(tv_nexus->tvn_se_sess)) {
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ kfree(tv_nexus);
+ return -ENOMEM;
+ }
+ /*
+ * Since we are running in 'demo mode' this call with generate a
+ * struct se_node_acl for the tcm_vhost struct se_portal_group with
+ * the SCSI Initiator port name of the passed configfs group 'name'.
+ */
+ tv_nexus->tvn_se_sess->se_node_acl = core_tpg_check_initiator_node_acl(
+ se_tpg, (unsigned char *)name);
+ if (!tv_nexus->tvn_se_sess->se_node_acl) {
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ pr_debug("core_tpg_check_initiator_node_acl() failed"
+ " for %s\n", name);
+ transport_free_session(tv_nexus->tvn_se_sess);
+ kfree(tv_nexus);
+ return -ENOMEM;
+ }
+ /*
+ * Now register the TCM vHost virtual I_T Nexus as active with the
+ * call to __transport_register_session()
+ */
+ __transport_register_session(se_tpg, tv_nexus->tvn_se_sess->se_node_acl,
+ tv_nexus->tvn_se_sess, tv_nexus);
+ tv_tpg->tpg_nexus = tv_nexus;
+
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ return 0;
+}
+
+static int tcm_vhost_drop_nexus(
+ struct tcm_vhost_tpg *tpg)
+{
+ struct se_session *se_sess;
+ struct tcm_vhost_nexus *tv_nexus;
+
+ mutex_lock(&tpg->tv_tpg_mutex);
+ tv_nexus = tpg->tpg_nexus;
+ if (!tv_nexus) {
+ mutex_unlock(&tpg->tv_tpg_mutex);
+ return -ENODEV;
+ }
+
+ se_sess = tv_nexus->tvn_se_sess;
+ if (!se_sess) {
+ mutex_unlock(&tpg->tv_tpg_mutex);
+ return -ENODEV;
+ }
+
+ if (atomic_read(&tpg->tv_tpg_port_count)) {
+ mutex_unlock(&tpg->tv_tpg_mutex);
+ pr_err("Unable to remove TCM_vHost I_T Nexus with"
+ " active TPG port count: %d\n",
+ atomic_read(&tpg->tv_tpg_port_count));
+ return -EPERM;
+ }
+
+ if (atomic_read(&tpg->tv_tpg_vhost_count)) {
+ pr_err("Unable to remove TCM_vHost I_T Nexus with"
+ " active TPG vhost count: %d\n",
+ atomic_read(&tpg->tv_tpg_vhost_count));
+ return -EPERM;
+ }
+
+ pr_debug("TCM_vHost_ConfigFS: Removing I_T Nexus to emulated"
+ " %s Initiator Port: %s\n", tcm_vhost_dump_proto_id(tpg->tport),
+ tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
+ /*
+ * Release the SCSI I_T Nexus to the emulated vHost Target Port
+ */
+ transport_deregister_session(tv_nexus->tvn_se_sess);
+ tpg->tpg_nexus = NULL;
+ mutex_unlock(&tpg->tv_tpg_mutex);
+
+ kfree(tv_nexus);
+ return 0;
+}
+
+static ssize_t tcm_vhost_tpg_show_nexus(
+ struct se_portal_group *se_tpg,
+ char *page)
+{
+ struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ struct tcm_vhost_nexus *tv_nexus;
+ ssize_t ret;
+
+ mutex_lock(&tv_tpg->tv_tpg_mutex);
+ tv_nexus = tv_tpg->tpg_nexus;
+ if (!tv_nexus) {
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+ return -ENODEV;
+ }
+ ret = snprintf(page, PAGE_SIZE, "%s\n",
+ tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
+ mutex_unlock(&tv_tpg->tv_tpg_mutex);
+
+ return ret;
+}
+
+static ssize_t tcm_vhost_tpg_store_nexus(
+ struct se_portal_group *se_tpg,
+ const char *page,
+ size_t count)
+{
+ struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+ struct tcm_vhost_tport *tport_wwn = tv_tpg->tport;
+ unsigned char i_port[TCM_VHOST_NAMELEN], *ptr, *port_ptr;
+ int ret;
+ /*
+ * Shutdown the active I_T nexus if 'NULL' is passed..
+ */
+ if (!strncmp(page, "NULL", 4)) {
+ ret = tcm_vhost_drop_nexus(tv_tpg);
+ return (!ret) ? count : ret;
+ }
+ /*
+ * Otherwise make sure the passed virtual Initiator port WWN matches
+ * the fabric protocol_id set in tcm_vhost_make_tport(), and call
+ * tcm_vhost_make_nexus().
+ */
+ if (strlen(page) > TCM_VHOST_NAMELEN) {
+ pr_err("Emulated NAA Sas Address: %s, exceeds"
+ " max: %d\n", page, TCM_VHOST_NAMELEN);
+ return -EINVAL;
+ }
+ snprintf(&i_port[0], TCM_VHOST_NAMELEN, "%s", page);
+
+ ptr = strstr(i_port, "naa.");
+ if (ptr) {
+ if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_SAS) {
+ pr_err("Passed SAS Initiator Port %s does not"
+ " match target port protoid: %s\n", i_port,
+ tcm_vhost_dump_proto_id(tport_wwn));
+ return -EINVAL;
+ }
+ port_ptr = &i_port[0];
+ goto check_newline;
+ }
+ ptr = strstr(i_port, "fc.");
+ if (ptr) {
+ if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_FCP) {
+ pr_err("Passed FCP Initiator Port %s does not"
+ " match target port protoid: %s\n", i_port,
+ tcm_vhost_dump_proto_id(tport_wwn));
+ return -EINVAL;
+ }
+ port_ptr = &i_port[3]; /* Skip over "fc." */
+ goto check_newline;
+ }
+ ptr = strstr(i_port, "iqn.");
+ if (ptr) {
+ if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_ISCSI) {
+ pr_err("Passed iSCSI Initiator Port %s does not"
+ " match target port protoid: %s\n", i_port,
+ tcm_vhost_dump_proto_id(tport_wwn));
+ return -EINVAL;
+ }
+ port_ptr = &i_port[0];
+ goto check_newline;
+ }
+ pr_err("Unable to locate prefix for emulated Initiator Port:"
+ " %s\n", i_port);
+ return -EINVAL;
+ /*
+ * Clear any trailing newline for the NAA WWN
+ */
+check_newline:
+ if (i_port[strlen(i_port)-1] == '\n')
+ i_port[strlen(i_port)-1] = '\0';
+
+ ret = tcm_vhost_make_nexus(tv_tpg, port_ptr);
+ if (ret < 0)
+ return ret;
+
+ return count;
+}
+
+TF_TPG_BASE_ATTR(tcm_vhost, nexus, S_IRUGO | S_IWUSR);
+
+static struct configfs_attribute *tcm_vhost_tpg_attrs[] = {
+ &tcm_vhost_tpg_nexus.attr,
+ NULL,
+};
+
+static struct se_portal_group *tcm_vhost_make_tpg(
+ struct se_wwn *wwn,
+ struct config_group *group,
+ const char *name)
+{
+ struct tcm_vhost_tport*tport = container_of(wwn,
+ struct tcm_vhost_tport, tport_wwn);
+
+ struct tcm_vhost_tpg *tpg;
+ unsigned long tpgt;
+ int ret;
+
+ if (strstr(name, "tpgt_") != name)
+ return ERR_PTR(-EINVAL);
+ if (strict_strtoul(name + 5, 10, &tpgt) || tpgt > UINT_MAX)
+ return ERR_PTR(-EINVAL);
+
+ tpg = kzalloc(sizeof(struct tcm_vhost_tpg), GFP_KERNEL);
+ if (!tpg) {
+ pr_err("Unable to allocate struct tcm_vhost_tpg");
+ return ERR_PTR(-ENOMEM);
+ }
+ mutex_init(&tpg->tv_tpg_mutex);
+ INIT_LIST_HEAD(&tpg->tv_tpg_list);
+ tpg->tport = tport;
+ tpg->tport_tpgt = tpgt;
+
+ ret = core_tpg_register(&tcm_vhost_fabric_configfs->tf_ops, wwn,
+ &tpg->se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL);
+ if (ret < 0) {
+ kfree(tpg);
+ return NULL;
+ }
+ mutex_lock(&tcm_vhost_mutex);
+ list_add_tail(&tpg->tv_tpg_list, &tcm_vhost_list);
+ mutex_unlock(&tcm_vhost_mutex);
+
+ return &tpg->se_tpg;
+}
+
+static void tcm_vhost_drop_tpg(struct se_portal_group *se_tpg)
+{
+ struct tcm_vhost_tpg *tpg = container_of(se_tpg,
+ struct tcm_vhost_tpg, se_tpg);
+
+ mutex_lock(&tcm_vhost_mutex);
+ list_del(&tpg->tv_tpg_list);
+ mutex_unlock(&tcm_vhost_mutex);
+ /*
+ * Release the virtual I_T Nexus for this vHost TPG
+ */
+ tcm_vhost_drop_nexus(tpg);
+ /*
+ * Deregister the se_tpg from TCM..
+ */
+ core_tpg_deregister(se_tpg);
+ kfree(tpg);
+}
+
+static struct se_wwn *tcm_vhost_make_tport(
+ struct target_fabric_configfs *tf,
+ struct config_group *group,
+ const char *name)
+{
+ struct tcm_vhost_tport *tport;
+ char *ptr;
+ u64 wwpn = 0;
+ int off = 0;
+
+ /* if (tcm_vhost_parse_wwn(name, &wwpn, 1) < 0)
+ return ERR_PTR(-EINVAL); */
+
+ tport = kzalloc(sizeof(struct tcm_vhost_tport), GFP_KERNEL);
+ if (!tport) {
+ pr_err("Unable to allocate struct tcm_vhost_tport");
+ return ERR_PTR(-ENOMEM);
+ }
+ tport->tport_wwpn = wwpn;
+ /* tcm_vhost_format_wwn(&tport->tport_name[0], TCM_VHOST__NAMELEN, wwpn); */
+ /*
+ * Determine the emulated Protocol Identifier and Target Port Name
+ * based on the incoming configfs directory name.
+ */
+ ptr = strstr(name, "naa.");
+ if (ptr) {
+ tport->tport_proto_id = SCSI_PROTOCOL_SAS;
+ goto check_len;
+ }
+ ptr = strstr(name, "fc.");
+ if (ptr) {
+ tport->tport_proto_id = SCSI_PROTOCOL_FCP;
+ off = 3; /* Skip over "fc." */
+ goto check_len;
+ }
+ ptr = strstr(name, "iqn.");
+ if (ptr) {
+ tport->tport_proto_id = SCSI_PROTOCOL_ISCSI;
+ goto check_len;
+ }
+
+ pr_err("Unable to locate prefix for emulated Target Port:"
+ " %s\n", name);
+ return ERR_PTR(-EINVAL);
+
+check_len:
+ if (strlen(name) > TCM_VHOST_NAMELEN) {
+ pr_err("Emulated %s Address: %s, exceeds"
+ " max: %d\n", name, tcm_vhost_dump_proto_id(tport),
+ TCM_VHOST_NAMELEN);
+ kfree(tport);
+ return ERR_PTR(-EINVAL);
+ }
+ snprintf(&tport->tport_name[0], TCM_VHOST_NAMELEN, "%s", &name[off]);
+
+ pr_debug("TCM_VHost_ConfigFS: Allocated emulated Target"
+ " %s Address: %s\n", tcm_vhost_dump_proto_id(tport), name);
+
+ return &tport->tport_wwn;
+}
+
+static void tcm_vhost_drop_tport(struct se_wwn *wwn)
+{
+ struct tcm_vhost_tport *tport = container_of(wwn,
+ struct tcm_vhost_tport, tport_wwn);
+
+ pr_debug("TCM_VHost_ConfigFS: Deallocating emulated Target"
+ " %s Address: %s\n", tcm_vhost_dump_proto_id(tport),
+ tport->tport_name);;
+
+ kfree(tport);
+}
+
+static ssize_t tcm_vhost_wwn_show_attr_version(
+ struct target_fabric_configfs *tf,
+ char *page)
+{
+ return sprintf(page, "TCM_VHOST fabric module %s on %s/%s"
+ "on "UTS_RELEASE"\n", TCM_VHOST_VERSION, utsname()->sysname,
+ utsname()->machine);
+}
+
+TF_WWN_ATTR_RO(tcm_vhost, version);
+
+static struct configfs_attribute *tcm_vhost_wwn_attrs[] = {
+ &tcm_vhost_wwn_version.attr,
+ NULL,
+};
+
+static struct target_core_fabric_ops tcm_vhost_ops = {
+ .get_fabric_name = tcm_vhost_get_fabric_name,
+ .get_fabric_proto_ident = tcm_vhost_get_fabric_proto_ident,
+ .tpg_get_wwn = tcm_vhost_get_fabric_wwn,
+ .tpg_get_tag = tcm_vhost_get_tag,
+ .tpg_get_default_depth = tcm_vhost_get_default_depth,
+ .tpg_get_pr_transport_id = tcm_vhost_get_pr_transport_id,
+ .tpg_get_pr_transport_id_len = tcm_vhost_get_pr_transport_id_len,
+ .tpg_parse_pr_out_transport_id = tcm_vhost_parse_pr_out_transport_id,
+ .tpg_check_demo_mode = tcm_vhost_check_true,
+ .tpg_check_demo_mode_cache = tcm_vhost_check_true,
+ .tpg_check_demo_mode_write_protect = tcm_vhost_check_false,
+ .tpg_check_prod_mode_write_protect = tcm_vhost_check_false,
+ .tpg_alloc_fabric_acl = tcm_vhost_alloc_fabric_acl,
+ .tpg_release_fabric_acl = tcm_vhost_release_fabric_acl,
+ .tpg_get_inst_index = tcm_vhost_tpg_get_inst_index,
+ .new_cmd_map = tcm_vhost_new_cmd_map,
+ .release_cmd = tcm_vhost_release_cmd,
+ .shutdown_session = tcm_vhost_shutdown_session,
+ .close_session = tcm_vhost_close_session,
+ .sess_get_index = tcm_vhost_sess_get_index,
+ .sess_get_initiator_sid = NULL,
+ .write_pending = tcm_vhost_write_pending,
+ .write_pending_status = tcm_vhost_write_pending_status,
+ .set_default_node_attributes = tcm_vhost_set_default_node_attrs,
+ .get_task_tag = tcm_vhost_get_task_tag,
+ .get_cmd_state = tcm_vhost_get_cmd_state,
+ .queue_data_in = tcm_vhost_queue_data_in,
+ .queue_status = tcm_vhost_queue_status,
+ .queue_tm_rsp = tcm_vhost_queue_tm_rsp,
+ .get_fabric_sense_len = tcm_vhost_get_fabric_sense_len,
+ .set_fabric_sense_len = tcm_vhost_set_fabric_sense_len,
+ /*
+ * Setup function pointers for generic logic in target_core_fabric_configfs.c
+ */
+ .fabric_make_wwn = tcm_vhost_make_tport,
+ .fabric_drop_wwn = tcm_vhost_drop_tport,
+ .fabric_make_tpg = tcm_vhost_make_tpg,
+ .fabric_drop_tpg = tcm_vhost_drop_tpg,
+ .fabric_post_link = tcm_vhost_port_link,
+ .fabric_pre_unlink = tcm_vhost_port_unlink,
+ .fabric_make_np = NULL,
+ .fabric_drop_np = NULL,
+ .fabric_make_nodeacl = tcm_vhost_make_nodeacl,
+ .fabric_drop_nodeacl = tcm_vhost_drop_nodeacl,
+};
+
+static int tcm_vhost_register_configfs(void)
+{
+ struct target_fabric_configfs *fabric;
+ int ret;
+
+ pr_debug("TCM_VHOST fabric module %s on %s/%s"
+ " on "UTS_RELEASE"\n",TCM_VHOST_VERSION, utsname()->sysname,
+ utsname()->machine);
+ /*
+ * Register the top level struct config_item_type with TCM core
+ */
+ fabric = target_fabric_configfs_init(THIS_MODULE, "vhost");
+ if (IS_ERR(fabric)) {
+ pr_err("target_fabric_configfs_init() failed\n");
+ return PTR_ERR(fabric);
+ }
+ /*
+ * Setup fabric->tf_ops from our local tcm_vhost_ops
+ */
+ fabric->tf_ops = tcm_vhost_ops;
+ /*
+ * Setup default attribute lists for various fabric->tf_cit_tmpl
+ */
+ TF_CIT_TMPL(fabric)->tfc_wwn_cit.ct_attrs = tcm_vhost_wwn_attrs;
+ TF_CIT_TMPL(fabric)->tfc_tpg_base_cit.ct_attrs = tcm_vhost_tpg_attrs;
+ TF_CIT_TMPL(fabric)->tfc_tpg_attrib_cit.ct_attrs = NULL;
+ TF_CIT_TMPL(fabric)->tfc_tpg_param_cit.ct_attrs = NULL;
+ TF_CIT_TMPL(fabric)->tfc_tpg_np_base_cit.ct_attrs = NULL;
+ TF_CIT_TMPL(fabric)->tfc_tpg_nacl_base_cit.ct_attrs = NULL;
+ TF_CIT_TMPL(fabric)->tfc_tpg_nacl_attrib_cit.ct_attrs = NULL;
+ TF_CIT_TMPL(fabric)->tfc_tpg_nacl_auth_cit.ct_attrs = NULL;
+ TF_CIT_TMPL(fabric)->tfc_tpg_nacl_param_cit.ct_attrs = NULL;
+ /*
+ * Register the fabric for use within TCM
+ */
+ ret = target_fabric_configfs_register(fabric);
+ if (ret < 0) {
+ pr_err("target_fabric_configfs_register() failed"
+ " for TCM_VHOST\n");
+ return ret;
+ }
+ /*
+ * Setup our local pointer to *fabric
+ */
+ tcm_vhost_fabric_configfs = fabric;
+ pr_debug("TCM_VHOST[0] - Set fabric -> tcm_vhost_fabric_configfs\n");
+ return 0;
+};
+
+static void tcm_vhost_deregister_configfs(void)
+{
+ if (!tcm_vhost_fabric_configfs)
+ return;
+
+ target_fabric_configfs_deregister(tcm_vhost_fabric_configfs);
+ tcm_vhost_fabric_configfs = NULL;
+ pr_debug("TCM_VHOST[0] - Cleared tcm_vhost_fabric_configfs\n");
+};
+
+static int __init tcm_vhost_init(void)
+{
+ int ret;
+
+ ret = vhost_scsi_register();
+ if (ret < 0)
+ return ret;
+
+ ret = tcm_vhost_register_configfs();
+ if (ret < 0)
+ return ret;
+
+ return 0;
+};
+
+static void tcm_vhost_exit(void)
+{
+ tcm_vhost_deregister_configfs();
+ vhost_scsi_deregister();
+};
+
+MODULE_DESCRIPTION("TCM_VHOST series fabric driver");
+MODULE_LICENSE("GPL");
+module_init(tcm_vhost_init);
+module_exit(tcm_vhost_exit);
diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h
new file mode 100644
index 0000000..0e8951b
--- /dev/null
+++ b/drivers/vhost/tcm_vhost.h
@@ -0,0 +1,70 @@
+#define TCM_VHOST_VERSION "v0.1"
+#define TCM_VHOST_NAMELEN 256
+#define TCM_VHOST_MAX_CDB_SIZE 32
+
+struct tcm_vhost_cmd {
+ /* Descriptor from vhost_get_vq_desc() for virt_queue segment */
+ int tvc_vq_desc;
+ /* The Tag from include/linux/virtio_scsi.h:struct virtio_scsi_cmd_req */
+ u64 tvc_tag;
+ /* The number of scatterlists associated with this cmd */
+ u32 tvc_sgl_count;
+ /* Pointer to the SGL formatted memory from virtio-scsi */
+ struct scatterlist *tvc_sgl;
+ /* Pointer to response */
+ struct virtio_scsi_cmd_resp __user *tvc_resp;
+ /* Pointer to vhost_scsi for our device */
+ struct vhost_scsi *tvc_vhost;
+ /* The TCM I/O descriptor that is accessed via container_of() */
+ struct se_cmd tvc_se_cmd;
+ /* Copy of the incoming SCSI command descriptor block (CDB) */
+ unsigned char tvc_cdb[TCM_VHOST_MAX_CDB_SIZE];
+ /* Sense buffer that will be mapped into outgoing status */
+ unsigned char tvc_sense_buf[TRANSPORT_SENSE_BUFFER];
+ /* Completed commands list, serviced from vhost worker thread */
+ struct list_head tvc_completion_list;
+};
+
+struct tcm_vhost_nexus {
+ /* Pointer to TCM session for I_T Nexus */
+ struct se_session *tvn_se_sess;
+};
+
+struct tcm_vhost_nacl {
+ /* Binary World Wide unique Port Name for Vhost Initiator port */
+ u64 iport_wwpn;
+ /* ASCII formatted WWPN for Sas Initiator port */
+ char iport_name[TCM_VHOST_NAMELEN];
+ /* Returned by tcm_vhost_make_nodeacl() */
+ struct se_node_acl se_node_acl;
+};
+
+struct tcm_vhost_tpg {
+ /* Vhost port target portal group tag for TCM */
+ u16 tport_tpgt;
+ /* Used to track number of TPG Port/Lun Links wrt to explict I_T Nexus shutdown */
+ atomic_t tv_tpg_port_count;
+ /* Used for vhost_scsi device reference to tpg_nexus */
+ atomic_t tv_tpg_vhost_count;
+ /* list for tcm_vhost_list */
+ struct list_head tv_tpg_list;
+ /* Used to protect access for tpg_nexus */
+ struct mutex tv_tpg_mutex;
+ /* Pointer to the TCM VHost I_T Nexus for this TPG endpoint */
+ struct tcm_vhost_nexus *tpg_nexus;
+ /* Pointer back to tcm_vhost_tport */
+ struct tcm_vhost_tport *tport;
+ /* Returned by tcm_vhost_make_tpg() */
+ struct se_portal_group se_tpg;
+};
+
+struct tcm_vhost_tport {
+ /* SCSI protocol the tport is providing */
+ u8 tport_proto_id;
+ /* Binary World Wide unique Port Name for Vhost Target port */
+ u64 tport_wwpn;
+ /* ASCII formatted WWPN for Vhost Target port */
+ char tport_name[TCM_VHOST_NAMELEN];
+ /* Returned by tcm_vhost_make_tport() */
+ struct se_wwn tport_wwn;
+};
--
1.7.2.5
^ permalink raw reply related
* [PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
From: Nicholas A. Bellinger @ 2012-07-04 4:24 UTC (permalink / raw)
To: target-devel
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
Christoph Hellwig
In-Reply-To: <1341375846-27882-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@linux-iscsi.org>
This patch changes virtio-scsi to use a new virtio_driver->scan() callback
so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has
set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring
operation, instead of from within virtscsi_probe().
This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and
virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK
had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur. This fixes a bug
with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs.
Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
drivers/scsi/virtio_scsi.c | 15 ++++++++++++---
drivers/virtio/virtio.c | 5 ++++-
include/linux/virtio.h | 1 +
3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 1b38431..391b30d 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -481,9 +481,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
err = scsi_add_host(shost, &vdev->dev);
if (err)
goto scsi_add_host_failed;
-
- scsi_scan_host(shost);
-
+ /*
+ * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan()
+ * after VIRTIO_CONFIG_S_DRIVER_OK has been set..
+ */
return 0;
scsi_add_host_failed:
@@ -493,6 +494,13 @@ virtscsi_init_failed:
return err;
}
+static void virtscsi_scan(struct virtio_device *vdev)
+{
+ struct Scsi_Host *shost = (struct Scsi_Host *)vdev->priv;
+
+ scsi_scan_host(shost);
+}
+
static void virtscsi_remove_vqs(struct virtio_device *vdev)
{
/* Stop all the virtqueues. */
@@ -537,6 +545,7 @@ static struct virtio_driver virtio_scsi_driver = {
.driver.owner = THIS_MODULE,
.id_table = id_table,
.probe = virtscsi_probe,
+ .scan = virtscsi_scan,
#ifdef CONFIG_PM
.freeze = virtscsi_freeze,
.restore = virtscsi_restore,
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f355807..c3b3f7f 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -141,8 +141,11 @@ static int virtio_dev_probe(struct device *_d)
err = drv->probe(dev);
if (err)
add_status(dev, VIRTIO_CONFIG_S_FAILED);
- else
+ else {
add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+ if (drv->scan)
+ drv->scan(dev);
+ }
return err;
}
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 8efd28a..a1ba8bb 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -92,6 +92,7 @@ struct virtio_driver {
const unsigned int *feature_table;
unsigned int feature_table_size;
int (*probe)(struct virtio_device *dev);
+ void (*scan)(struct virtio_device *dev);
void (*remove)(struct virtio_device *dev);
void (*config_changed)(struct virtio_device *dev);
#ifdef CONFIG_PM
--
1.7.2.5
^ permalink raw reply related
* [PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
From: Nicholas A. Bellinger @ 2012-07-04 4:24 UTC (permalink / raw)
To: target-devel
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, linux-scsi, Paolo Bonzini, lf-virt,
Christoph Hellwig
In-Reply-To: <1341375846-27882-1-git-send-email-nab@linux-iscsi.org>
From: Nicholas Bellinger <nab@linux-iscsi.org>
This is currently required for connecting to tcm_vhost in order to prevent
the client LUN scan from detecting the same tcm_vhost WWPN on multiple target
IDs.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
drivers/scsi/virtio_scsi.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 391b30d..8711951 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -475,7 +475,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev)
shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue);
shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0xFFFF;
shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1;
- shost->max_id = virtscsi_config_get(vdev, max_target) + 1;
+ /*
+ * Currently required for tcm_vhost to function..
+ */
+ shost->max_id = 1;
shost->max_channel = 0;
shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE;
err = scsi_add_host(shost, &vdev->dev);
--
1.7.2.5
^ permalink raw reply related
* Re: [PATCH 1/6] vhost: Separate vhost-net features from vhost features
From: Asias He @ 2012-07-04 4:41 UTC (permalink / raw)
To: Nicholas A. Bellinger
Cc: Jens Axboe, Stefan Hajnoczi, kvm-devel, Michael S. Tsirkin,
Zhi Yong Wu, Anthony Liguori, target-devel, linux-scsi,
Paolo Bonzini, lf-virt, Christoph Hellwig
In-Reply-To: <1341375846-27882-2-git-send-email-nab@linux-iscsi.org>
On 07/04/2012 12:24 PM, Nicholas A. Bellinger wrote:
> From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>
> In order for other vhost devices to use the VHOST_FEATURES bits the
> vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES
> constant.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
I think you need to change drivers/vhost/test.c as well.
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 3de00d9..91d6f06 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -261,14 +261,14 @@ static long vhost_test_ioctl(struct file *f,
unsigned int ioctl,
return -EFAULT;
return vhost_test_run(n, test);
case VHOST_GET_FEATURES:
- features = VHOST_FEATURES;
+ features = VHOST_NET_FEATURES;
if (copy_to_user(featurep, &features, sizeof features))
return -EFAULT;
return 0;
case VHOST_SET_FEATURES:
if (copy_from_user(&features, featurep, sizeof features))
return -EFAULT;
- if (features & ~VHOST_FEATURES)
+ if (features & ~VHOST_NET_FEATURES)
return -EOPNOTSUPP;
return vhost_test_set_features(n, features);
case VHOST_RESET_OWNER:
> ---
> drivers/vhost/net.c | 4 ++--
> drivers/vhost/vhost.h | 3 ++-
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index f82a739..072cbba 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
> return -EFAULT;
> return vhost_net_set_backend(n, backend.index, backend.fd);
> case VHOST_GET_FEATURES:
> - features = VHOST_FEATURES;
> + features = VHOST_NET_FEATURES;
> if (copy_to_user(featurep, &features, sizeof features))
> return -EFAULT;
> return 0;
> case VHOST_SET_FEATURES:
> if (copy_from_user(&features, featurep, sizeof features))
> return -EFAULT;
> - if (features & ~VHOST_FEATURES)
> + if (features & ~VHOST_NET_FEATURES)
> return -EOPNOTSUPP;
> return vhost_net_set_features(n, features);
> case VHOST_RESET_OWNER:
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 8de1fd5..07b9763 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -201,7 +201,8 @@ enum {
> VHOST_FEATURES = (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) |
> (1ULL << VIRTIO_RING_F_INDIRECT_DESC) |
> (1ULL << VIRTIO_RING_F_EVENT_IDX) |
> - (1ULL << VHOST_F_LOG_ALL) |
> + (1ULL << VHOST_F_LOG_ALL),
> + VHOST_NET_FEATURES = VHOST_FEATURES |
> (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
> (1ULL << VIRTIO_NET_F_MRG_RXBUF),
> };
>
--
Asias
^ permalink raw reply related
* Re: [PATCH 1/2] virtio-blk spec: document topology info
From: Rusty Russell @ 2012-07-04 5:52 UTC (permalink / raw)
To: Paolo Bonzini, virtualization, kvm
In-Reply-To: <1341321412-24214-2-git-send-email-pbonzini@redhat.com>
On Tue, 3 Jul 2012 15:16:51 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Current QEMU and Linux drivers can export queue parameters via the
> virtio-blk configuration space. Document this, since the next patch
> will have to add another configuration field after these.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Applied, thanks for the grammar fix, too!
> + u8 physical_block_exp;
> + u8 alignment_offset;
> + u16 min_io_size;
> + u32 opt_io_size;
These aren't entirely self-evident though :(
I've put the latest on github (your changes included!), so please patch
against that:
https://github.com/rustyrussell/virtio-spec
Thanks,
Rusty.
^ permalink raw reply
* Re: [PATCH v2] virtio-scsi: hotplug support for virtio-scsi
From: Zhi Yong Wu @ 2012-07-04 6:10 UTC (permalink / raw)
To: Cong Meng
Cc: stefanha, linux-scsi, zwanp, linuxram, senwang, linux-kernel,
Paolo Bonzini, virtualization
In-Reply-To: <1341294085-17164-1-git-send-email-mc@linux.vnet.ibm.com>
On Tue, Jul 3, 2012 at 1:41 PM, Cong Meng <mc@linux.vnet.ibm.com> wrote:
> This patch implements the hotplug support for virtio-scsi.
> When there is a device attached/detached, the virtio-scsi driver will be
> signaled via event virtual queue and it will add/remove the scsi device
> in question automatically.
>
> v2: handle no_event event
>
> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
> ---
> drivers/scsi/virtio_scsi.c | 113 ++++++++++++++++++++++++++++++++++++++++++-
> include/linux/virtio_scsi.h | 9 ++++
> 2 files changed, 121 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 9fc5e67..e44b2d6 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -25,6 +25,7 @@
> #include <scsi/scsi_cmnd.h>
>
> #define VIRTIO_SCSI_MEMPOOL_SZ 64
> +#define VIRTIO_SCSI_EVENT_LEN 8
>
> /* Command queue element */
> struct virtio_scsi_cmd {
> @@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
> } resp;
> } ____cacheline_aligned_in_smp;
>
> +struct virtio_scsi_event_node {
> + struct virtio_scsi *vscsi;
> + struct virtio_scsi_event event;
> + struct work_struct work;
> +};
> +
> struct virtio_scsi_vq {
> /* Protects vq */
> spinlock_t vq_lock;
> @@ -67,6 +74,9 @@ struct virtio_scsi {
> struct virtio_scsi_vq event_vq;
> struct virtio_scsi_vq req_vq;
>
> + /* Get some buffers reday for event vq */
s/reday/ready/ ^^^^
> + struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
> +
> struct virtio_scsi_target_state *tgt[];
> };
>
> @@ -202,6 +212,97 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
> spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
> };
>
> +static int virtscsi_kick_event(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event_node *event_node)
> +{
> + int ret;
> + struct scatterlist sg;
> + unsigned long flags;
> +
> + sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
> +
> + spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> +
> + ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1, event_node, GFP_ATOMIC);
> + if (ret >= 0)
> + virtqueue_kick(vscsi->event_vq.vq);
> +
> + spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> +
> + return ret;
> +}
> +
> +static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
> +{
> + int i;
> +
> + for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
> + vscsi->event_list[i].vscsi = vscsi;
> + virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
> + }
> +
> + return 0;
> +}
> +
> +static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
> + struct virtio_scsi_event *event)
> +{
> + struct scsi_device *sdev;
> + struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
> + unsigned int target = event->lun[1];
> + unsigned int lun = (event->lun[2] << 8) | event->lun[3];
> +
> + switch (event->reason) {
> + case VIRTIO_SCSI_EVT_RESET_RESCAN:
> + scsi_add_device(shost, 0, target, lun);
> + break;
> + case VIRTIO_SCSI_EVT_RESET_REMOVED:
> + sdev = scsi_device_lookup(shost, 0, target, lun);
> + if (sdev) {
> + scsi_remove_device(sdev);
> + scsi_device_put(sdev);
> + } else {
> + pr_err("SCSI device %d 0 %d %d not found\n",
> + shost->host_no, target, lun);
> + }
> + break;
> + default:
> + pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
> + }
> +}
> +
> +static void virtscsi_handle_event(struct work_struct *work)
> +{
> + struct virtio_scsi_event_node *event_node =
> + container_of(work, struct virtio_scsi_event_node, work);
> + struct virtio_scsi *vscsi = event_node->vscsi;
> + struct virtio_scsi_event *event = &event_node->event;
> +
> + if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
> + event->event &= (~VIRTIO_SCSI_T_EVENTS_MISSED);
> + /* FIXME: handle event missed here */
> + }
> +
> + switch (event->event) {
> + case VIRTIO_SCSI_T_NO_EVENT:
> + break;
> + case VIRTIO_SCSI_T_TRANSPORT_RESET:
> + virtscsi_handle_transport_reset(vscsi, event);
> + break;
> + default:
> + pr_err("Unsupport virtio scsi event %x\n", event->event);
> + }
> + virtscsi_kick_event(vscsi, event_node);
> +}
> +
> +static void virtscsi_complete_event(void *buf)
> +{
> + struct virtio_scsi_event_node *event_node = buf;
> +
> + INIT_WORK(&event_node->work, virtscsi_handle_event);
> + schedule_work(&event_node->work);
> +}
> +
> static void virtscsi_event_done(struct virtqueue *vq)
> {
> struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
> @@ -209,7 +310,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
> unsigned long flags;
>
> spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
> - virtscsi_vq_done(vq, virtscsi_complete_free);
> + virtscsi_vq_done(vq, virtscsi_complete_event);
> spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
> };
>
> @@ -510,6 +611,10 @@ static int virtscsi_init(struct virtio_device *vdev,
> virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
> virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
>
> + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
> + virtscsi_kick_event_all(vscsi);
> + }
> +
> /* We need to know how many segments before we allocate. */
> sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
>
> @@ -608,7 +713,13 @@ static struct virtio_device_id id_table[] = {
> { 0 },
> };
>
> +static unsigned int features[] = {
> + VIRTIO_SCSI_F_HOTPLUG
> +};
> +
> static struct virtio_driver virtio_scsi_driver = {
> + .feature_table = features,
> + .feature_table_size = ARRAY_SIZE(features),
> .driver.name = KBUILD_MODNAME,
> .driver.owner = THIS_MODULE,
> .id_table = id_table,
> diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
> index 8ddeafd..dc8d305 100644
> --- a/include/linux/virtio_scsi.h
> +++ b/include/linux/virtio_scsi.h
> @@ -69,6 +69,10 @@ struct virtio_scsi_config {
> u32 max_lun;
> } __packed;
>
> +/* Feature Bits */
> +#define VIRTIO_SCSI_F_INOUT 0
> +#define VIRTIO_SCSI_F_HOTPLUG 1
> +
> /* Response codes */
> #define VIRTIO_SCSI_S_OK 0
> #define VIRTIO_SCSI_S_OVERRUN 1
> @@ -105,6 +109,11 @@ struct virtio_scsi_config {
> #define VIRTIO_SCSI_T_TRANSPORT_RESET 1
> #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
>
> +/* Reasons of transport reset event */
> +#define VIRTIO_SCSI_EVT_RESET_HARD 0
> +#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
> +#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
> +
> #define VIRTIO_SCSI_S_SIMPLE 0
> #define VIRTIO_SCSI_S_ORDERED 1
> #define VIRTIO_SCSI_S_HEAD 2
> --
> 1.7.7.6
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
Regards,
Zhi Yong Wu
^ permalink raw reply
* Re: [PATCH v3 2/4] virtio_balloon: handle concurrent accesses to virtio_balloon struct elements
From: Rusty Russell @ 2012-07-04 6:38 UTC (permalink / raw)
To: linux-mm
Cc: Rik van Riel, Rafael Aquini, Michael S. Tsirkin,
Konrad Rzeszutek Wilk, linux-kernel, virtualization, Minchan Kim,
Andi Kleen, Andrew Morton
In-Reply-To: <e5f3c6d456f04adeac9fd714a6278424d71a97a0.1341353014.git.aquini@redhat.com>
On Tue, 3 Jul 2012 20:48:50 -0300, Rafael Aquini <aquini@redhat.com> wrote:
> This patch introduces access sychronization to critical elements of struct
> virtio_balloon, in order to allow the thread concurrency compaction/migration
> bits might ended up imposing to the balloon driver on several situations.
>
> Signed-off-by: Rafael Aquini <aquini@redhat.com>
That's pretty vague, and it's almost impossible to audit this.
It looks very suspicious though:
> -static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
> -{
> - struct scatterlist sg;
> -
> - sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +/* Protection for concurrent accesses to balloon virtqueues and vb->acked */
> +DEFINE_MUTEX(vb_queue_completion);
>
> +static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq,
> + struct scatterlist *sg)
> +{
> + mutex_lock(&vb_queue_completion);
> init_completion(&vb->acked);
>
> /* We should always be able to add one buffer to an empty queue. */
> - if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
> + if (virtqueue_add_buf(vq, sg, 1, 0, vb, GFP_KERNEL) < 0)
> BUG();
> virtqueue_kick(vq);
>
> /* When host has read buffer, this completes via balloon_ack */
> wait_for_completion(&vb->acked);
> + mutex_unlock(&vb_queue_completion);
> }
OK, this lock is superceded by Michael's patch, and AFAICT is not due to
any requirement introduced by these patches.
> static void set_page_pfns(u32 pfns[], struct page *page)
> @@ -126,9 +132,12 @@ static void set_page_pfns(u32 pfns[], struct page *page)
>
> static void fill_balloon(struct virtio_balloon *vb, size_t num)
> {
> + struct scatterlist sg;
> + int alloc_failed = 0;
> /* We can only do one array worth at a time. */
> num = min(num, ARRAY_SIZE(vb->pfns));
>
> + spin_lock(&vb->pfn_list_lock);
> for (vb->num_pfns = 0; vb->num_pfns < num;
> vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
> @@ -138,8 +147,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> dev_printk(KERN_INFO, &vb->vdev->dev,
> "Out of puff! Can't get %zu pages\n",
> num);
> - /* Sleep for at least 1/5 of a second before retry. */
> - msleep(200);
> + alloc_failed = 1;
> break;
> }
> set_page_pfns(vb->pfns + vb->num_pfns, page);
> @@ -149,10 +157,19 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
> }
>
> /* Didn't get any? Oh well. */
> - if (vb->num_pfns == 0)
> + if (vb->num_pfns == 0) {
> + spin_unlock(&vb->pfn_list_lock);
> return;
> + }
> +
> + sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> + spin_unlock(&vb->pfn_list_lock);
>
> - tell_host(vb, vb->inflate_vq);
> + /* alloc_page failed, sleep for at least 1/5 of a sec before retry. */
> + if (alloc_failed)
> + msleep(200);
> +
> + tell_host(vb, vb->inflate_vq, &sg);
So, we drop the lock which procects vp->pfns[] and vb->num_pfns, then
use it in tell_host? Surely it could be corrupted between there.
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index bfbc15c..d47c5c2 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -51,6 +51,10 @@ struct virtio_balloon
>
> /* Number of balloon pages we've told the Host we're not using. */
> unsigned int num_pages;
> +
> + /* Protect 'pages', 'pfns' & 'num_pnfs' against concurrent updates */
> + spinlock_t pfn_list_lock;
> +
> /*
> * The pages we've told the Host we're not using.
> * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
You might be better of taking num_pfns and pfns[] out of struct
virtio_balloon, and putting them on the stack (maybe 64, not 256).
Cheers,
Rusty.
^ permalink raw reply
* [PATCH] xen: populate correct number of pages when across mem boundary
From: zhenzhong.duan @ 2012-07-04 6:49 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, jeremy, tglx, mingo, hpa, xen-devel
Cc: x86, Feng Jin, linux-kernel, virtualization
When populate pages across a mem boundary at bootup, the page count
populated isn't correct. This is due to mem populated to non-mem
region and ignored.
Pfn range is also wrongly aligned when mem boundary isn't page aligned.
Also need consider the rare case when xen_do_chunk fail(populate).
For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is:
[ 0.000000] Freeing 9e-100 pfn range: 98 pages freed
[ 0.000000] 1-1 mapping on 9e->100
[ 0.000000] 1-1 mapping on cd9ff->100000
[ 0.000000] Released 98 pages of unused memory
[ 0.000000] Set 206435 page(s) to 1-1 mapping
-[ 0.000000] Populating cd9fe-cda00 pfn range: 1 pages added
+[ 0.000000] Populating cd9fe-cd9ff pfn range: 1 pages added
+[ 0.000000] Populating 100000-100061 pfn range: 97 pages added
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 000000000009e000 (usable)
[ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
[ 0.000000] Xen: 0000000000100000 - 00000000cd9ff000 (usable)
[ 0.000000] Xen: 00000000cd9ffc00 - 00000000cda53c00 (ACPI NVS)
...
[ 0.000000] Xen: 0000000100000000 - 0000000100061000 (usable)
[ 0.000000] Xen: 0000000100061000 - 000000012c000000 (unusable)
...
[ 0.000000] MEMBLOCK configuration:
...
-[ 0.000000] reserved[0x4] [0x000000cd9ff000-0x000000cd9ffbff], 0xc00 bytes
-[ 0.000000] reserved[0x5] [0x00000100000000-0x00000100060fff], 0x61000 bytes
Related xen memory layout:
(XEN) Xen-e820 RAM map:
(XEN) 0000000000000000 - 000000000009ec00 (usable)
(XEN) 00000000000f0000 - 0000000000100000 (reserved)
(XEN) 0000000000100000 - 00000000cd9ffc00 (usable)
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
---
arch/x86/xen/setup.c | 24 +++++++++++-------------
1 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index a4790bf..bd78773 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -157,50 +157,48 @@ static unsigned long __init xen_populate_chunk(
unsigned long dest_pfn;
for (i = 0, entry = list; i < map_size; i++, entry++) {
- unsigned long credits = credits_left;
unsigned long s_pfn;
unsigned long e_pfn;
unsigned long pfns;
long capacity;
- if (credits <= 0)
+ if (credits_left <= 0)
break;
if (entry->type != E820_RAM)
continue;
- e_pfn = PFN_UP(entry->addr + entry->size);
+ e_pfn = PFN_DOWN(entry->addr + entry->size);
/* We only care about E820 after the xen_start_info->nr_pages */
if (e_pfn <= max_pfn)
continue;
- s_pfn = PFN_DOWN(entry->addr);
+ s_pfn = PFN_UP(entry->addr);
/* If the E820 falls within the nr_pages, we want to start
* at the nr_pages PFN.
* If that would mean going past the E820 entry, skip it
*/
+again:
if (s_pfn <= max_pfn) {
capacity = e_pfn - max_pfn;
dest_pfn = max_pfn;
} else {
- /* last_pfn MUST be within E820_RAM regions */
- if (*last_pfn && e_pfn >= *last_pfn)
- s_pfn = *last_pfn;
capacity = e_pfn - s_pfn;
dest_pfn = s_pfn;
}
- /* If we had filled this E820_RAM entry, go to the next one. */
- if (capacity <= 0)
- continue;
- if (credits > capacity)
- credits = capacity;
+ if (credits_left < capacity)
+ capacity = credits_left;
- pfns = xen_do_chunk(dest_pfn, dest_pfn + credits, false);
+ pfns = xen_do_chunk(dest_pfn, dest_pfn + capacity, false);
done += pfns;
credits_left -= pfns;
*last_pfn = (dest_pfn + pfns);
+ if (credits_left > 0 && *last_pfn < e_pfn) {
+ s_pfn = *last_pfn;
+ goto again;
+ }
}
return done;
}
--
1.7.3
^ permalink raw reply related
* Re: [PATCH v2] virtio-scsi: hotplug support for virtio-scsi
From: mc @ 2012-07-04 8:11 UTC (permalink / raw)
To: Paolo Bonzini
Cc: stefanha, linux-scsi, senwang, zwanp, linuxram, linux-kernel,
virtualization
In-Reply-To: <4FF2DC6C.1010506@redhat.com>
Quoting Paolo Bonzini <pbonzini@redhat.com>:
> Il 03/07/2012 07:41, Cong Meng ha scritto:
>> This patch implements the hotplug support for virtio-scsi.
>> When there is a device attached/detached, the virtio-scsi driver will be
>> signaled via event virtual queue and it will add/remove the scsi device
>> in question automatically.
>>
>> v2: handle no_event event
>>
>> Signed-off-by: Cong Meng <mc@linux.vnet.ibm.com>
>> Signed-off-by: Sen Wang <senwang@linux.vnet.ibm.com>
>
> The SoB lines are swapped. Otherwise looks good. Since you have to
> respin, please add dropped event support too, it shouldn't be hard.
What does "The SoB lines are swapped" mean? should the changelog be
placed after SoB lines?
>
> Paolo
>
>> ---
>> drivers/scsi/virtio_scsi.c | 113
>> ++++++++++++++++++++++++++++++++++++++++++-
>> include/linux/virtio_scsi.h | 9 ++++
>> 2 files changed, 121 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>> index 9fc5e67..e44b2d6 100644
>> --- a/drivers/scsi/virtio_scsi.c
>> +++ b/drivers/scsi/virtio_scsi.c
>> @@ -25,6 +25,7 @@
>> #include <scsi/scsi_cmnd.h>
>>
>> #define VIRTIO_SCSI_MEMPOOL_SZ 64
>> +#define VIRTIO_SCSI_EVENT_LEN 8
>>
>> /* Command queue element */
>> struct virtio_scsi_cmd {
>> @@ -43,6 +44,12 @@ struct virtio_scsi_cmd {
>> } resp;
>> } ____cacheline_aligned_in_smp;
>>
>> +struct virtio_scsi_event_node {
>> + struct virtio_scsi *vscsi;
>> + struct virtio_scsi_event event;
>> + struct work_struct work;
>> +};
>> +
>> struct virtio_scsi_vq {
>> /* Protects vq */
>> spinlock_t vq_lock;
>> @@ -67,6 +74,9 @@ struct virtio_scsi {
>> struct virtio_scsi_vq event_vq;
>> struct virtio_scsi_vq req_vq;
>>
>> + /* Get some buffers reday for event vq */
>> + struct virtio_scsi_event_node event_list[VIRTIO_SCSI_EVENT_LEN];
>> +
>> struct virtio_scsi_target_state *tgt[];
>> };
>>
>> @@ -202,6 +212,97 @@ static void virtscsi_ctrl_done(struct virtqueue *vq)
>> spin_unlock_irqrestore(&vscsi->ctrl_vq.vq_lock, flags);
>> };
>>
>> +static int virtscsi_kick_event(struct virtio_scsi *vscsi,
>> + struct virtio_scsi_event_node *event_node)
>> +{
>> + int ret;
>> + struct scatterlist sg;
>> + unsigned long flags;
>> +
>> + sg_set_buf(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
>> +
>> + spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
>> +
>> + ret = virtqueue_add_buf(vscsi->event_vq.vq, &sg, 0, 1,
>> event_node, GFP_ATOMIC);
>> + if (ret >= 0)
>> + virtqueue_kick(vscsi->event_vq.vq);
>> +
>> + spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
>> +
>> + return ret;
>> +}
>> +
>> +static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
>> + vscsi->event_list[i].vscsi = vscsi;
>> + virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void virtscsi_handle_transport_reset(struct virtio_scsi *vscsi,
>> + struct
>> virtio_scsi_event *event)
>> +{
>> + struct scsi_device *sdev;
>> + struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev);
>> + unsigned int target = event->lun[1];
>> + unsigned int lun = (event->lun[2] << 8) | event->lun[3];
>> +
>> + switch (event->reason) {
>> + case VIRTIO_SCSI_EVT_RESET_RESCAN:
>> + scsi_add_device(shost, 0, target, lun);
>> + break;
>> + case VIRTIO_SCSI_EVT_RESET_REMOVED:
>> + sdev = scsi_device_lookup(shost, 0, target, lun);
>> + if (sdev) {
>> + scsi_remove_device(sdev);
>> + scsi_device_put(sdev);
>> + } else {
>> + pr_err("SCSI device %d 0 %d %d not found\n",
>> + shost->host_no, target, lun);
>> + }
>> + break;
>> + default:
>> + pr_info("Unsupport virtio scsi event reason %x\n", event->reason);
>> + }
>> +}
>> +
>> +static void virtscsi_handle_event(struct work_struct *work)
>> +{
>> + struct virtio_scsi_event_node *event_node =
>> + container_of(work, struct virtio_scsi_event_node, work);
>> + struct virtio_scsi *vscsi = event_node->vscsi;
>> + struct virtio_scsi_event *event = &event_node->event;
>> +
>> + if (event->event & VIRTIO_SCSI_T_EVENTS_MISSED) {
>> + event->event &= (~VIRTIO_SCSI_T_EVENTS_MISSED);
>> + /* FIXME: handle event missed here */
>> + }
>> +
>> + switch (event->event) {
>> + case VIRTIO_SCSI_T_NO_EVENT:
>> + break;
>> + case VIRTIO_SCSI_T_TRANSPORT_RESET:
>> + virtscsi_handle_transport_reset(vscsi, event);
>> + break;
>> + default:
>> + pr_err("Unsupport virtio scsi event %x\n", event->event);
>> + }
>> + virtscsi_kick_event(vscsi, event_node);
>> +}
>> +
>> +static void virtscsi_complete_event(void *buf)
>> +{
>> + struct virtio_scsi_event_node *event_node = buf;
>> +
>> + INIT_WORK(&event_node->work, virtscsi_handle_event);
>> + schedule_work(&event_node->work);
>> +}
>> +
>> static void virtscsi_event_done(struct virtqueue *vq)
>> {
>> struct Scsi_Host *sh = virtio_scsi_host(vq->vdev);
>> @@ -209,7 +310,7 @@ static void virtscsi_event_done(struct virtqueue *vq)
>> unsigned long flags;
>>
>> spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
>> - virtscsi_vq_done(vq, virtscsi_complete_free);
>> + virtscsi_vq_done(vq, virtscsi_complete_event);
>> spin_unlock_irqrestore(&vscsi->event_vq.vq_lock, flags);
>> };
>>
>> @@ -510,6 +611,10 @@ static int virtscsi_init(struct virtio_device *vdev,
>> virtscsi_config_set(vdev, cdb_size, VIRTIO_SCSI_CDB_SIZE);
>> virtscsi_config_set(vdev, sense_size, VIRTIO_SCSI_SENSE_SIZE);
>>
>> + if (virtio_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) {
>> + virtscsi_kick_event_all(vscsi);
>> + }
>> +
>> /* We need to know how many segments before we allocate. */
>> sg_elems = virtscsi_config_get(vdev, seg_max) ?: 1;
>>
>> @@ -608,7 +713,13 @@ static struct virtio_device_id id_table[] = {
>> { 0 },
>> };
>>
>> +static unsigned int features[] = {
>> + VIRTIO_SCSI_F_HOTPLUG
>> +};
>> +
>> static struct virtio_driver virtio_scsi_driver = {
>> + .feature_table = features,
>> + .feature_table_size = ARRAY_SIZE(features),
>> .driver.name = KBUILD_MODNAME,
>> .driver.owner = THIS_MODULE,
>> .id_table = id_table,
>> diff --git a/include/linux/virtio_scsi.h b/include/linux/virtio_scsi.h
>> index 8ddeafd..dc8d305 100644
>> --- a/include/linux/virtio_scsi.h
>> +++ b/include/linux/virtio_scsi.h
>> @@ -69,6 +69,10 @@ struct virtio_scsi_config {
>> u32 max_lun;
>> } __packed;
>>
>> +/* Feature Bits */
>> +#define VIRTIO_SCSI_F_INOUT 0
>> +#define VIRTIO_SCSI_F_HOTPLUG 1
>> +
>> /* Response codes */
>> #define VIRTIO_SCSI_S_OK 0
>> #define VIRTIO_SCSI_S_OVERRUN 1
>> @@ -105,6 +109,11 @@ struct virtio_scsi_config {
>> #define VIRTIO_SCSI_T_TRANSPORT_RESET 1
>> #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
>>
>> +/* Reasons of transport reset event */
>> +#define VIRTIO_SCSI_EVT_RESET_HARD 0
>> +#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
>> +#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
>> +
>> #define VIRTIO_SCSI_S_SIMPLE 0
>> #define VIRTIO_SCSI_S_ORDERED 1
>> #define VIRTIO_SCSI_S_HEAD 2
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox