* [PATCH RFC 01/11] um: Make UBD requests synchronous in TT ext/infcpu mode
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:24 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 02/11] um: add a simple time_travel_handler implementation Benjamin Beichler
` (9 subsequent siblings)
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
The UBD driver employs multiple threads to enhance block device accesses
in userspace. These threads communicate via pipes and are triggered by
interrupts that utilize the SIGIO handler.
However, in TT mode, both inf-cpu and external modes, this asynchronous,
multithreaded request processing lead to issues where requests are not
processed. This occurs because there is no dedicated time travel handler
for the UBD interrupt.
Since asynchronous, multithreaded request processing does not provide
substantial benefits in time travel mode and may even introduce
additional overhead (as multiple threads are scheduled sequentially to
execute requests in TT mode with infinite CPU power), this patch
switches to synchronous request processing directly within the
submit_request call for the respective TT modes.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/drivers/ubd_kern.c | 45 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 38 insertions(+), 7 deletions(-)
diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 50206feac577..cdad289d5032 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -44,6 +44,7 @@
#include "ubd.h"
#include <os.h>
#include "cow.h"
+#include "timetravel.h"
/* Max request size is determined by sector mask - 32K */
#define UBD_MAX_REQUEST (8 * sizeof(long))
@@ -456,6 +457,17 @@ static int bulk_req_safe_read(
return n;
}
+static void finalize_request(struct io_thread_req *io_req)
+{
+ if ((io_req->error == BLK_STS_NOTSUPP) &&
+ (req_op(io_req->req) == REQ_OP_DISCARD)) {
+ blk_queue_max_discard_sectors(io_req->req->q, 0);
+ blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
+ }
+ blk_mq_end_request(io_req->req, io_req->error);
+ kfree(io_req);
+}
+
/* Called without dev->lock held, and only in interrupt context. */
static void ubd_handler(void)
{
@@ -479,13 +491,7 @@ static void ubd_handler(void)
}
for (count = 0; count < n/sizeof(struct io_thread_req *); count++) {
struct io_thread_req *io_req = (*irq_req_buffer)[count];
-
- if ((io_req->error == BLK_STS_NOTSUPP) && (req_op(io_req->req) == REQ_OP_DISCARD)) {
- blk_queue_max_discard_sectors(io_req->req->q, 0);
- blk_queue_max_write_zeroes_sectors(io_req->req->q, 0);
- }
- blk_mq_end_request(io_req->req, io_req->error);
- kfree(io_req);
+ finalize_request(io_req);
}
}
}
@@ -1136,6 +1142,17 @@ static int __init ubd_driver_init(void){
/* Letting ubd=sync be like using ubd#s= instead of ubd#= is
* enough. So use anyway the io thread. */
}
+
+#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+ /* do not initialize asynchronous io-thread and corresponding irq
+ * in inf-cpu or ext time travel, as we need synchronous io logic
+ */
+
+ if (time_travel_mode == TT_MODE_INFCPU ||
+ time_travel_mode == TT_MODE_EXTERNAL)
+ return 0;
+#endif
+
stack = alloc_stack(0, 0);
io_pid = start_io_thread(stack + PAGE_SIZE, &thread_fd);
if(io_pid < 0){
@@ -1312,8 +1329,11 @@ static struct io_thread_req *ubd_alloc_req(struct ubd *dev, struct request *req,
return io_req;
}
+static void do_io(struct io_thread_req *req, struct io_desc *desc);
+
static int ubd_submit_request(struct ubd *dev, struct request *req)
{
+ int i;
int segs = 0;
struct io_thread_req *io_req;
int ret;
@@ -1334,6 +1354,17 @@ static int ubd_submit_request(struct ubd *dev, struct request *req)
if (segs)
ubd_map_req(dev, io_req, req);
+#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+ //do the request sychronous (bypass io_thread and ubd_handler)
+ if (time_travel_mode == TT_MODE_INFCPU ||
+ time_travel_mode == TT_MODE_EXTERNAL) {
+ for (i = 0; !io_req->error && i < io_req->desc_cnt; i++)
+ do_io(io_req, &io_req->io_desc[i]);
+ finalize_request(io_req);
+ return 0;
+ }
+#endif
+
ret = os_write_file(thread_fd, &io_req, sizeof(io_req));
if (ret != sizeof(io_req)) {
if (ret != -EAGAIN)
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 01/11] um: Make UBD requests synchronous in TT ext/infcpu mode
2023-11-03 16:41 ` [PATCH RFC 01/11] um: Make UBD requests synchronous in TT ext/infcpu mode Benjamin Beichler
@ 2023-11-06 20:24 ` Johannes Berg
0 siblings, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:24 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> The UBD driver employs multiple threads to enhance block device accesses
> in userspace. These threads communicate via pipes and are triggered by
> interrupts that utilize the SIGIO handler.
>
> However, in TT mode, both inf-cpu and external modes, this asynchronous,
> multithreaded request processing lead to issues where requests are not
> processed. This occurs because there is no dedicated time travel handler
> for the UBD interrupt.
>
> Since asynchronous, multithreaded request processing does not provide
> substantial benefits in time travel mode and may even introduce
> additional overhead (as multiple threads are scheduled sequentially to
> execute requests in TT mode with infinite CPU power), this patch
> switches to synchronous request processing directly within the
> submit_request call for the respective TT modes.
This makes perfect sense. We mostly use hostfs, but recently did the
exact same thing there.
> +#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
> + /* do not initialize asynchronous io-thread and corresponding irq
> + * in inf-cpu or ext time travel, as we need synchronous io logic
> + */
> +
> + if (time_travel_mode == TT_MODE_INFCPU ||
> + time_travel_mode == TT_MODE_EXTERNAL)
Maybe we should finally refactor these checks though ... :)
And you don't need the ifdef, because time_travel_mode is defined to
TT_MODE_OFF without CONFIG_UML_TIME_TRAVEL_SUPPORT, and then this just
gets optimised out.
> static int ubd_submit_request(struct ubd *dev, struct request *req)
> {
> + int i;
> int segs = 0;
> struct io_thread_req *io_req;
> int ret;
> @@ -1334,6 +1354,17 @@ static int ubd_submit_request(struct ubd *dev, struct request *req)
> if (segs)
> ubd_map_req(dev, io_req, req);
>
> +#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
> + //do the request sychronous (bypass io_thread and ubd_handler)
> + if (time_travel_mode == TT_MODE_INFCPU ||
> + time_travel_mode == TT_MODE_EXTERNAL) {
> + for (i = 0; !io_req->error && i < io_req->desc_cnt; i++)
> + do_io(io_req, &io_req->io_desc[i]);
> + finalize_request(io_req);
> + return 0;
> + }
> +#endif
>
Same here, and in fact with the ifdef you'd get an unused variable
warning, but anyway you could move that into the block (or even into the
for now that we require C99!)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 02/11] um: add a simple time_travel_handler implementation
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
2023-11-03 16:41 ` [PATCH RFC 01/11] um: Make UBD requests synchronous in TT ext/infcpu mode Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-03 16:41 ` [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts Benjamin Beichler
` (8 subsequent siblings)
10 siblings, 0 replies; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
Use in uml drivers, which do not need more than scheduling the interrupt
at the time of a follow up timetravel GET-Message.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/include/shared/irq_user.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/um/include/shared/irq_user.h b/arch/um/include/shared/irq_user.h
index da0f6eea30d0..8c336c79ffb9 100644
--- a/arch/um/include/shared/irq_user.h
+++ b/arch/um/include/shared/irq_user.h
@@ -17,6 +17,13 @@ enum um_irq_type {
struct siginfo;
extern void sigio_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs);
void sigio_run_timetravel_handlers(void);
+
+struct time_travel_event;
+extern void time_travel_add_irq_event(struct time_travel_event *ev);
+static inline void simple_timetravel_handler(int irq, int fd, void *data,
+ struct time_travel_event *ev) {
+ time_travel_add_irq_event(ev);
+}
extern void free_irq_by_fd(int fd);
extern void deactivate_fd(int fd, int irqnum);
extern int deactivate_all_fds(void);
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
2023-11-03 16:41 ` [PATCH RFC 01/11] um: Make UBD requests synchronous in TT ext/infcpu mode Benjamin Beichler
2023-11-03 16:41 ` [PATCH RFC 02/11] um: add a simple time_travel_handler implementation Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:26 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK Benjamin Beichler
` (7 subsequent siblings)
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
This change permits interrupts on serial lines in time travel mode,
especially in external mode. However, these interrupts are processed
with the simple handler that does not provide any acknowledgment.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/drivers/line.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/um/drivers/line.c b/arch/um/drivers/line.c
index b98545f3edb5..3cda0ae41824 100644
--- a/arch/um/drivers/line.c
+++ b/arch/um/drivers/line.c
@@ -260,9 +260,9 @@ int line_setup_irq(int fd, int input, int output, struct line *line, void *data)
int err;
if (input) {
- err = um_request_irq(UM_IRQ_ALLOC, fd, IRQ_READ,
+ err = um_request_irq_tt(UM_IRQ_ALLOC, fd, IRQ_READ,
line_interrupt, 0,
- driver->read_irq_name, data);
+ driver->read_irq_name, data, simple_timetravel_handler);
if (err < 0)
return err;
@@ -270,9 +270,9 @@ int line_setup_irq(int fd, int input, int output, struct line *line, void *data)
}
if (output) {
- err = um_request_irq(UM_IRQ_ALLOC, fd, IRQ_WRITE,
+ err = um_request_irq_tt(UM_IRQ_ALLOC, fd, IRQ_WRITE,
line_write_interrupt, 0,
- driver->write_irq_name, data);
+ driver->write_irq_name, data, simple_timetravel_handler);
if (err < 0)
return err;
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-03 16:41 ` [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts Benjamin Beichler
@ 2023-11-06 20:26 ` Johannes Berg
2023-11-10 16:53 ` Benjamin Beichler
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:26 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> This change permits interrupts on serial lines in time travel mode,
> especially in external mode. However, these interrupts are processed
> with the simple handler that does not provide any acknowledgment.
>
Yeah... we had this discussion in the other thread.
At least in my mental model this is broken because the sender of the
event basically has to prepare the calendar for it happening, which
feels ... odd.
But I can see where you're coming from, and switching to virtio console
that does it all right might be tricky, so ... perhaps this still makes
some sense.
I'd still think we should put a warning into the code somewhere when you
actually use line interrupts though?
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-06 20:26 ` Johannes Berg
@ 2023-11-10 16:53 ` Benjamin Beichler
2023-11-10 17:16 ` Johannes Berg
0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-10 16:53 UTC (permalink / raw)
To: Johannes Berg, Richard Weinberger, Anton Ivanov; +Cc: linux-um
Am 06.11.2023 um 21:26 schrieb Johannes Berg:
> ________________________________
>
> On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
>> This change permits interrupts on serial lines in time travel mode,
>> especially in external mode. However, these interrupts are processed
>> with the simple handler that does not provide any acknowledgment.
>>
> Yeah... we had this discussion in the other thread.
>
> At least in my mental model this is broken because the sender of the
> event basically has to prepare the calendar for it happening, which
> feels ... odd.
Actually, for me, this would make kind of sense. Why couldn't we be more
explicit in the time travel protocol to the calendar about interrupts?
We could use the ID from the start-msg to identify um instances and
tell, in an additional msg, that we are going to trigger an interrupt at
the current time to that instance from the device simulation.
Another way would be, to serve such information (if the device
simulation really need it) over the facilities we already established.
Why not send the ACK as MSG_OOB or ancillary data?
>
> But I can see where you're coming from, and switching to virtio console
> that does it all right might be tricky, so ... perhaps this still makes
> some sense.
>
> I'd still think we should put a warning into the code somewhere when you
> actually use line interrupts though?
>
> johannes
>
Benjamin
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-10 16:53 ` Benjamin Beichler
@ 2023-11-10 17:16 ` Johannes Berg
2023-11-13 11:46 ` Benjamin Beichler
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Berg @ 2023-11-10 17:16 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-10 at 17:53 +0100, Benjamin Beichler wrote:
> >
> > At least in my mental model this is broken because the sender of the
> > event basically has to prepare the calendar for it happening, which
> > feels ... odd.
> Actually, for me, this would make kind of sense.
I'm not sure I agree, I guess. To me, the sender is just telling someone
else "here's something going on I want you to look at". That doesn't
really mean the receiver needs to actually handle it "right away". Maybe
the receiver implements a model where it's modelling an interrupt
latency (which _would_ be more realistic), so it only handles the
message some (small amount of) time later. Maybe it's in a polling model
right now and has disabled interrupt sources, and just wakes every so
often to check for messages. Maybe something else.
So I don't really agree that baking a "must be scheduled immediately"
into any protocol really makes sense.
> Why couldn't we be more
> explicit in the time travel protocol to the calendar about interrupts?
> We could use the ID from the start-msg to identify um instances and
> tell, in an additional msg, that we are going to trigger an interrupt at
> the current time to that instance from the device simulation.
But see above - I basically just don't think the sender of the event is
the right position to know that information.
> Another way would be, to serve such information (if the device
> simulation really need it) over the facilities we already established.
> Why not send the ACK as MSG_OOB or ancillary data?
Don't think I understand that. Which "facilities" are you referring to
here?
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-10 17:16 ` Johannes Berg
@ 2023-11-13 11:46 ` Benjamin Beichler
2023-11-13 21:22 ` Johannes Berg
0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-13 11:46 UTC (permalink / raw)
To: Johannes Berg, Richard Weinberger, Anton Ivanov; +Cc: linux-um
Am 10.11.2023 um 18:16 schrieb Johannes Berg:
> ________________________________
>
> On Fri, 2023-11-10 at 17:53 +0100, Benjamin Beichler wrote:
>>> At least in my mental model this is broken because the sender of the
>>> event basically has to prepare the calendar for it happening, which
>>> feels ... odd.
>> Actually, for me, this would make kind of sense.
> I'm not sure I agree, I guess. To me, the sender is just telling someone
> else "here's something going on I want you to look at". That doesn't
> really mean the receiver needs to actually handle it "right away". Maybe
> the receiver implements a model where it's modelling an interrupt
> latency (which _would_ be more realistic), so it only handles the
> message some (small amount of) time later. Maybe it's in a polling model
> right now and has disabled interrupt sources, and just wakes every so
> often to check for messages. Maybe something else.
>
> So I don't really agree that baking a "must be scheduled immediately"
> into any protocol really makes sense.
I think you are mixing here some levels of abstraction. Some entity in
the "system" needs to
react to an interrupt immediately. In the real world, this is mostly
done by some kind of interrupt controller
which may latch the interrupt (if they are deactivated from CPU side).
Of course some more sophisticated
bus protocols (i.e. virtio/vhost in this case) can also signal the
device to do no interrupts, but in the most
basic case, e.g. a button at a GPIO pin, the device could not handle
that. Most of the interrupt abstraction
in um is realized via signals, but some more seems (IMHO) to be needed
from the tt-protocol, if we have external
device simulations. My suggestion was to add the virtual interrupt line
state change as a message to the calendar.
Of course, you could argue, that the device simulation needs to be
clever enough (so distributing this task of the
interrupt controller), but even therefore the device simulation need
some more information to reflect delayed interrupts,
deactivated interrupts and so on.
What confuses me is, that you explicitly use the immediate synchronous
response model on the interrupts from the vhost/virtio
driver, but you seem to refuse this on other drivers.
>
>> Why couldn't we be more
>> explicit in the time travel protocol to the calendar about interrupts?
>> We could use the ID from the start-msg to identify um instances and
>> tell, in an additional msg, that we are going to trigger an interrupt at
>> the current time to that instance from the device simulation.
> But see above - I basically just don't think the sender of the event is
> the right position to know that information.
Mhh if I apply that to the vhost/virtio simulations, that is, what is
implicitly modeled by
the exchanged file descriptors and vhost/virtio protocol state.
We may change the whole tt-ext-mode to only accept vhost/virtio drivers,
but that needs to
be stated explicitly in the documentation. I'm not really in favor of
that, but if it is documented
it is easier to understand.
>
>> Another way would be, to serve such information (if the device
>> simulation really need it) over the facilities we already established.
>> Why not send the ACK as MSG_OOB or ancillary data?
> Don't think I understand that. Which "facilities" are you referring to
> here?
I hadn't researched that on Friday, but we could use a custom control
msg (cmsg) on the unix sockets
attached to the (serial) line devices. The device simulation can then
wait on that ancillary data for
synchronous interrupt handling, as you have proposed for vhost/virtio.
My first idea was to use
MSG_OOB, which do not exit on Unix-Domain-Sockets. If someone does not
use uds as transport
we may give a warning.
Benjamin
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-13 11:46 ` Benjamin Beichler
@ 2023-11-13 21:22 ` Johannes Berg
2023-11-13 21:57 ` Johannes Berg
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Berg @ 2023-11-13 21:22 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Mon, 2023-11-13 at 12:46 +0100, Benjamin Beichler wrote:
> > So I don't really agree that baking a "must be scheduled immediately"
> > into any protocol really makes sense.
> I think you are mixing here some levels of abstraction. Some entity in
> the "system" needs to react to an interrupt immediately.
Don't think I agree, that's not really necessary.
> In the real world, this is mostly
> done by some kind of interrupt controller
> which may latch the interrupt (if they are deactivated from CPU side).
Even that itself may just be polling some lines, theoretically.
But that's mostly theoretical - if you wanted to simulate some interrupt
latency, you'd simply not react 'immediately' but rather some amount of
time later; the interrupt controller and all that is basically all the
same piece of code.
> Of course some more sophisticated
> bus protocols (i.e. virtio/vhost in this case) can also signal the
> device to do no interrupts, but in the most
> basic case, e.g. a button at a GPIO pin, the device could not handle
> that.
Not sure I follow - how's that related to the protocol? Either you get
an interrupt or you don't, but once you do get it, there's also nothing
that says you will handle it immediately.
> Most of the interrupt abstraction in um is realized via signals,
Which is actually unnecessary for this kind of simulation since the UML
instance must be in idle by definition for someone else to be running
and sending it 'real' interrupts. We could as well just directly build
an epoll loop there, but that's more complicated surgery in the innards
of UML's interrupt model that isn't really needed.
Without time-travel, it _is_ needed since you want to have 'real'
interrupts that actually stop a process from running and you get into
the device driver and all.
> but some more seems (IMHO) to be needed
> from the tt-protocol, if we have external
> device simulations.
Well, I just said above _less_ is actually needed ;-)
The _more_ that's needed is to actually ensure the "only one thing is
running" part, which I've built into the system via acknowledgement (or
perhaps requirement for those).
> My suggestion was to add the virtual interrupt line
> state change as a message to the calendar.
Yes but it doesn't work unless the other side _already_ knows that it
will happen, because it broke the rule of "only one thing is running".
Again you can argue that it's fine to return to the calendar even if the
receiving process won't actually do anything, but because the receiving
process is still thinking about it, you end up with all the contortions
you have to do in patch 4 and 7 ... because even the _calendar_ doesn't
know when the request will actually come to it.
Perhaps one way of handling this that doesn't require all those
contortions would be for the sender of the event to actually
_completely_ handle the calendar request for the interrupt on behalf of
the receiver, so that the receiver doesn't actually do _anything_ but
mark "this IRQ is pending" in this case. Once it actually gets a RUN
message it will actually start running since it assumes that it will not
get a RUN message without having requested a calendar entry. If the
calendar entry were already handle on its behalf, you'd not need the
request and therefore not need the special handling for the request from
patch 4.
You'd need a different implementation of patches 2/3, and get rid of
patches 4, 6, 7 and 8.
So ... thinking about that more, the issue isn't so much that you're
making an assumption that the interrupt should happen right away, it's
that only some parts of your system are making that assumption (the
sender of the event that causes the interrupt), and then you're leaving
the rest of the system to catch up with something that actually gets
processed _asynchronously_, causing all kinds of trouble.
> Of course, you could argue, that the device simulation needs to be
> clever enough (so distributing this task of the
> interrupt controller), but even therefore the device simulation need
> some more information to reflect delayed interrupts,
> deactivated interrupts and so on.
>
> What confuses me is, that you explicitly use the immediate synchronous
> response model on the interrupts from the vhost/virtio
> driver, but you seem to refuse this on other drivers.
I didn't mean to _refuse_ it, and actually you'll note that I made the
usfstl device abstraction able to trivially implement interrupt latency
by setting a single parameter (named "interrupt_latency" :-) ), so it's
been something that was definitely on my mind, but so far no testing
scenario has needed it, and the simulation runs faster without it ;-)
So no, I'm not actually trying to tell you that you must implement
interrupt latency. Sorry I came across that way.
I was just trying to use the interrupt latency as a reasoning for why
the recipient should be in control, but now having thought about it
more, I don't even think the recipient _needs_ to be in control, just
that one side needs to be _fully_ in control, and the model you've built
has both sides in control and racing against each other (and in the case
of UML even with itself internally processing the signal.)
> > > Why couldn't we be more
> > > explicit in the time travel protocol to the calendar about interrupts?
> > > We could use the ID from the start-msg to identify um instances and
> > > tell, in an additional msg, that we are going to trigger an interrupt at
> > > the current time to that instance from the device simulation.
> > But see above - I basically just don't think the sender of the event is
> > the right position to know that information.
Yeah you know what, I partially retract that statement. I still think
kind of the right way to think about it is to have the recipient in
control, but there are indeed things like the line interrupts where
perhaps it's better to put the sender in control. It's perhaps not
trivial since you might want to do a few writes? You'd have to combine
that into making a single calendar interrupt processing entry on behalf
of the receiver.
> Mhh if I apply that to the vhost/virtio simulations, that is, what is
> implicitly modeled by the exchanged file descriptors and vhost/virtio
> protocol state.
Not sure I follow? The whole file descriptors and all are just
establishing communication mechanisms?
> We may change the whole tt-ext-mode to only accept vhost/virtio drivers,
> but that needs to
> be stated explicitly in the documentation. I'm not really in favor of
> that, but if it is documented
> it is easier to understand.
See above. I guess we don't have to, if we put a bit more requirement on
the sender.
> > Don't think I understand that. Which "facilities" are you referring to
> > here?
> I hadn't researched that on Friday, but we could use a custom control
> msg (cmsg) on the unix sockets
> attached to the (serial) line devices. The device simulation can then
> wait on that ancillary data for
> synchronous interrupt handling, as you have proposed for vhost/virtio.
> My first idea was to use
> MSG_OOB, which do not exit on Unix-Domain-Sockets. If someone does not
> use uds as transport
> we may give a warning.
Not sure that really works, what would you transfer in the other
direction? Or send an fd and then wait for it to get a message on that
fd or something?
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-13 21:22 ` Johannes Berg
@ 2023-11-13 21:57 ` Johannes Berg
2023-11-20 13:42 ` Benjamin Beichler
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Berg @ 2023-11-13 21:57 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Mon, 2023-11-13 at 22:22 +0100, Johannes Berg wrote:
> > My suggestion was to add the virtual interrupt line
> > state change as a message to the calendar.
>
> Yes but it doesn't work unless the other side _already_ knows that it
> will happen, because it broke the rule of "only one thing is running".
>
> Again you can argue that it's fine to return to the calendar even if the
> receiving process won't actually do anything, but because the receiving
> process is still thinking about it, you end up with all the contortions
> you have to do in patch 4 and 7 ... because even the _calendar_ doesn't
> know when the request will actually come to it.
>
> Perhaps one way of handling this that doesn't require all those
> contortions would be for the sender of the event to actually
> _completely_ handle the calendar request for the interrupt on behalf of
> the receiver, so that the receiver doesn't actually do _anything_ but
> mark "this IRQ is pending" in this case. Once it actually gets a RUN
> message it will actually start running since it assumes that it will not
> get a RUN message without having requested a calendar entry. If the
> calendar entry were already handle on its behalf, you'd not need the
> request and therefore not need the special handling for the request from
> patch 4.
> You'd need a different implementation of patches 2/3, and get rid of
> patches 4, 6, 7 and 8.
>
So maybe for my future self and all the bystanders, I'll try to explain
how I see the issue that causes patches 4, 6, 7 and 8 to be needed:
Actors: UML, ES (Event Sender), Cal (Calendar)
In UML, we're using a line-based protocols as in drivers/line.c
The ES is connected to that line and can send it messages.
(The same scenario would probably apply the other way around, in theory,
so we might need to have a way to implement this with roles reversed.)
Let's start by sending a message to UML, that's the whole point of this
discussion:
ES ---> UML
// so far nothing happened, the host kernel queued SIGIO to UML
ES: continues running until idle and returns to Cal
Now we already have a few options, and I don't know which one got
implemented.
A. Perhaps ES also told Cal to expect that UML is running:
Cal ---RUN--> UML
B. Perhaps Cal just asks everyone what their current request is
Cal ---GET--> UML
C. Perhaps something else?
In any case, we already see a first race here, let's say A happened, and
now:
! SIGIO -> UML
UML calls simple_timetravel_handler -> time_travel_add_irq_event
UML ---REQUEST--> Cal
UML <----RUN----- Cal
// UML really confused -> patch 4
Or maybe:
UML <-- RUN ----- CAL
UML: starts running, starts manipulating time event list
! SIGIO -> UML
UML calls simple_timetravel_handler -> time_travel_add_irq_event
manipulates time event list
// UML corrupts data structures -> patch 8
Or:
UML <-- RUN ------ CAL
UML runs only really briefly, sends new request,
time_travel_ext_prev_request_valid = true
! SIGIO -> UML
UML calls simple_timetravel_handler -> time_travel_add_irq_event
time_travel_ext_prev_request_valid == true
// no new request, Cal confused -> patch 6, and maybe 7
Not sure if that's really all completely accurate, and almost certainly
there are more scenarios that cause issues?
But in all cases the root cause is the asynchronous nature of doing
this; partially internally in UML (list protection etc.) and partially
outside UML (calendar doesn't know what's supposed to happen next until
async SIGIO is processed.)
In contrast, with virtio, you get
ES -- event ----> UML
// so far nothing happened, the host kernel queued SIGIO to UML
// ES waits for ACK
! SIGIO -> UML
UML calls simple_timetravel_handler -> time_travel_add_irq_event
UML ---REQUEST--> Cal
UML <--ACK------- Cal
ES <---ACK------- UML
ES: continues running until idle and returns to Cal
Cal: schedules next runnable entry, likely UML
Note how due to the "// ES waits for ACK" nothing bad happens, because
even if the host doesn't schedule the SIGIO immediately to the UML
process, or that needs some time, _nothing_ else in the simulation makes
progress either until UML has.
IMNSHO that's the far simpler model than taking into account all the
potential races of which I outlined some above and trying to work with
them.
Now then I went on to say that we could just basically make it _all_ the
sender's responsibility on behalf of the receiver, and then we'd get
ES -- event ----> UML
// so far nothing happened, the host kernel queued SIGIO to UML
ES -- add UML --> Cal
ES <--- ACK ----- Cal
ES: continues running u8ntil idle and returns to Cal
Cal: schedules next runnable entry, likely UML
Now the only thing we need to handle in terms of concurrency are two
scenarios that continue from that scenario:
1.
Cal --- RUN ----> UML
! SIGIO -> UML
and
2.
! SIGIO -> UML
Cal --- RUN ----> UML
Obviously 2 is what you expect, but again you can even have races like
in 1 and the SIGIO can happen at roughly any time.
I'm tempted to say the only reasonable way to handle that would be to
basically not do _anything_ in the SIGIO but go poll all the file
descriptors this might happen for upon receiving a RUN message.
We could even go so far as to add a new RUN_TRIGGERED_BY_OTHER message
that would make it clear that someone else entered the entry into the
calendar, and only then epoll for interrupts for this message, but I'm
not sure that's needed or makes sense (if you were going to wake up
anyway at that time, you'd still handle interrupts.)
In any case, that feels like a _far_ more tractable problem, and the
only concurrency would be between running and the SIGIO, where the SIGIO
basically no longer matters.
However ... if we consider this the other way around, we can actually
see that it's much harder to implement than it sounds - now suddenly
instead of having to connect the sockets with each other etc. you also
have to give the implementation knowledge about who is on the other
side, and how to even do a calendar request on their behalf! We don't
even have a protocol for doing a request on someone else's behalf today
(though with the shared memory you could just enter one), and actually
changing the protocol to support it might even be tricky since it
doesn't have much space for extra data in the messages ... maybe if we
split 'op' or use 'seq' for it... But you'd need to have a (pre-
assigned?) ID for each, or have them exchange IDs over something,
ideally the socket that's actually used for communication, but that
limits the kinds of sockets you can use, etc. So it's by no means
trivial either. And I understand that. I just don't think making the
protocol and implementations handle all the races that happen when you
start doing things asynchronously is really a good idea.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-13 21:57 ` Johannes Berg
@ 2023-11-20 13:42 ` Benjamin Beichler
2023-11-24 14:53 ` Johannes Berg
0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-20 13:42 UTC (permalink / raw)
To: Johannes Berg, Richard Weinberger, Anton Ivanov; +Cc: linux-um
Am 13.11.2023 um 22:57 schrieb Johannes Berg:
>
> So maybe for my future self and all the bystanders, I'll try to
> explain how I see the issue that causes patches 4, 6, 7 and 8 to be
> needed:
--- snip ---
Hello Johannes,
Sorry for my delayed response to your detailed email. I find it quite
hard to discuss such a complex topic via mailing lists without it
sounding impolite.
Maybe also as some basis for my reasoning: I'm quite familiar with
discrete event-based simulation, with a special focus on SystemC
simulation.There are
some common (old) principles of DES that map perfectly to the time
travel mode, so my mental model has always been shaped around those
constraints.
- Every instance/object/element in a DES has some inputs/signals that
activate it either at the current moment or at some later point in time.
- Every instance/object/element in a DES will run, starting from its
activation time until it has "finished", without advancing the global
simulation time.
- Events (or activations) occurring at the exact same point in
simulation time would happen in parallel. Therefore, the actual order of
execution in a
sequential simulator is more or less unimportant (though some
implementations may require a deterministic order, ensuring simulations
with the same
parameters yield the exact same output, e.g., SystemC). The parallel
execution of events at the same time in the simulation is an
optimization that
may be implemented, but is not always utilized.
After reviewing all your analyses, I believe the most significant
difference in my implementation lies in the last point. I did not
enforce the order
of message processing when they occur exactly at the same simulation
time. Consequently, I modified my implementation to eliminate the
synchronous
(and, to be honest, quite hacky) read operation with special handling on
the timetravel socket. Instead, I implemented a central epoll routine, which
is called by my master simulation kernel (NS3). My rationale was that if
I haven't received the request from the TT-protocol, I cannot advance time.
In conjunction with running not a single UML instance but many (my
current use case consists of a 10-node pair == 20 UML nodes), this can
create all
sorts of race/deadlock conditions, which we have identified.
For me, the ACK of vhost/virtio seemed somewhat redundant, as it
provides the same information as the TT-protocol, assuming my device
simulation
resides within the scheduler. I must admit my assumption was incorrect,
primarily because the implementation of the TT-protocol in the kernel is
somewhat fragile and, most importantly, your requirement that
***nothing***is allowed to interfere (creating SIGIOs) at certain states
of a time-traveling
UML instance is not well-documented. We need the ACK not for an
acknowledgment of registering the interrupt but to know that we are
allowed to send the
next TT-msg. This very tight coupling of these two protocols does not
appear to be the best design or, at the very least, is poorly documented.
The prohibition of interference in certain TT-states also led to my
second mistake. I relaxed my second DES requirement and allowed
interrupts when
the UML instance is in the RUN-state. This decision was based on the
impression that UML was built to work this way without TT, so why should it
break when in TT-Mode (which you proved was wrong). Whether this is
semantically reasonable or not can be questioned, but it triggered technical
problems with the current implementation.
With this realization, I tend to agree that maybe the whole patches to
ensure thread-safe (or reentrant-safe access) of the event list might be
dropped.
Still, we should ensure that the SIGIO is simply processed synchronously
in the idle loop. This aligns with my last DES constraint: since everything
happens at the same moment in simulation time, we do not need "real"
interrupts but can process interrupts (SIGIOs) later (but at the same
simulation time).
I think this approach only works in ext or cpu-inf mode and may be
problematic in "normal" timetravel mode. I might even consider dropping
the signal handler,
which marks the interrupts pending, and process the signals with a
signalfd, but that is, again, only an optimization.
Additionally, to address the interrupt acknowledgment for the serial
line, I'd like to propose this: why not add an extra file descriptor in the
command line, which is something the kernel could write to, such as an
eventfd or a pipe, to signal the acknowledgment of the interrupt. For
example, the
command line changes to ssl0=fd:0,fd:1,fd:3. If somebody uses the serial
line driver with timetravel mode but without that acknowledgment fd, we
can emit
a warning or an error.
I believe all these changes should work well with the shared memory
optimization and should make the entire time travel ext protocol a bit
more robust,
easier to use, and harder to misuse. ;-)
However, even after the lengthy discussion on "when" interrupts should
be processed, I still hold the opinion that the response to raising an
interrupt
should be immediate to the device simulation and not delayed in
simulation time. This only makes the device simulation harder without
real benefit. If
you want to delay the interrupt handling (ISR and so on), that's still
possible and in both situations highly dependent on the UML
implementation. If
we want to add an interrupt delay, we need to implement something in UML
anyway. If you want to delay it in the device driver, you can also always do
it, but you are not at the mercy of some hard-to-determine extra delay
from UML.
Overall, if you think that makes sense, I could start on some patches,
or perhaps you feel more comfortable doing that.
Benjamin
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts
2023-11-20 13:42 ` Benjamin Beichler
@ 2023-11-24 14:53 ` Johannes Berg
0 siblings, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-24 14:53 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
Hi,
> Sorry for my delayed response to your detailed email. I find it quite
> hard to discuss such a complex topic via mailing lists without it
> sounding impolite.
Heh, no worries about either :)
> Maybe also as some basis for my reasoning: I'm quite familiar with
> discrete event-based simulation, with a special focus on SystemC
> simulation.
Yeah OK, fair. I've only tangentially heard about SystemC :)
> There are
> some common (old) principles of DES that map perfectly to the time
> travel mode, so my mental model has always been shaped around those
> constraints.
> - Every instance/object/element in a DES has some inputs/signals that
> activate it either at the current moment or at some later point in
time.
> - Every instance/object/element in a DES will run, starting from its
> activation time until it has "finished", without advancing the global
> simulation time.
> - Events (or activations) occurring at the exact same point in
> simulation time would happen in parallel. Therefore, the actual order
of
> execution in a
> sequential simulator is more or less unimportant (though some
> implementations may require a deterministic order, ensuring
simulations
> with the same
> parameters yield the exact same output, e.g., SystemC). The parallel
> execution of events at the same time in the simulation is an
> optimization that
> may be implemented, but is not always utilized.
Sure.
> After reviewing all your analyses, I believe the most significant
> difference in my implementation lies in the last point. I did not
> enforce the order
> of message processing when they occur exactly at the same simulation
> time.
I'm not sure I fully agree with this - yes there's an ordering aspect to
it wrt. processing events that happen at the same simulation time, but
I'm not even sure that's the worst part of what you saw. Given the use
of SIGIO, the harder part is that you don't even have a guarantee that
an event is processed _at all_ at the same simulation time it was sent.
It might get processed later, at a different simulation time.
> Consequently, I modified my implementation to eliminate the
> synchronous
> (and, to be honest, quite hacky) read operation with special handling
on
> the timetravel socket. Instead, I implemented a central epoll routine,
which
> is called by my master simulation kernel (NS3). My rationale was that
if
> I haven't received the request from the TT-protocol, I cannot advance
time.
Yes, but even then I believe that a SIGIO event could be delayed to be
processed at a later simulation time than it should be.
> In conjunction with running not a single UML instance but many (my
> current use case consists of a 10-node pair == 20 UML nodes), this can
> create all
> sorts of race/deadlock conditions, which we have identified.
FWIW, I'm typically running 5-6 nodes with a simulated WiFi device each,
so the scale isn't that much different.
> For me, the ACK of vhost/virtio seemed somewhat redundant, as it
> provides the same information as the TT-protocol,
Actually, no, I don't agree that it provides the same information. The
ACK provides - to the sender of the event - the acknowledgement that the
event was actually seen. However, I will say that it's actually more a
consequence of UML using SIGIO than a fundamental part of the
simulation. If you were actually able to say "UML can do nothing upon
exiting idle state without checking for events first", you'd probably be
correct. However, that's an incorrect assumption given how UML's
interrupt model is implemented now.
So yes, we could remove the SIGIO model from UML when time-travel=ext or
=infcpu is in use, but that'd be a much more invasive rework.
However, if we'd actually do such a rework, then I believe you'd be
_almost_ correct with that sentence. The only other remaining lack of
information would be an updated free-until value, though you could
assume that the receiver of the event always wants to process it
immediately and update free-until=now from the sender of the event.
> assuming my device simulation
> resides within the scheduler.
I'm not sure that really makes a big difference. You could always set it
up in a way that the sender of the event causes free-until=now and
returns to the scheduler, though the scheduler would then have to
actually ask everyone for their current request [1] if it doesn't have
information about who an event was sent to.
[1] and each participant in the simulation would have to check for
pending events before answering such a question
> I must admit my assumption was incorrect,
> primarily because the implementation of the TT-protocol in the kernel
is
> somewhat fragile and, most importantly, your requirement that
> ***nothing***is allowed to interfere (creating SIGIOs) at certain
states
> of a time-traveling
> UML instance is not well-documented.
I guess I don't agree with it being fragile, but that's OK :)
You make it sound like that's an inherent fault of the protocol, when
really it's a consequence of the distributed scheduling model, IMHO.
And I think maybe this is also where SystemC-based intuition fails,
because things are distributed here.
> We need the ACK not for an
> acknowledgment of registering the interrupt but to know that we are
> allowed to send the
> next TT-msg. This very tight coupling of these two protocols does not
> appear to be the best design or, at the very least, is poorly
documented.
I think you're mixing up the design and the implementation a bit. We do
need the ACK for saying that the interrupt was actually registered,
because otherwise - given SIGIO - we have no chance to know it was
processed synchronously. But I think I've described the possible
variants of the model above already.
If you didn't use SIGIO and processed events and TT-msgs in the same
mainloop, you'd not have to worry about the states, and that's true of
all of my device implementations, see e.g. network in my scheduler. But
that still doesn't get rid of the requirement for ACKs given how we want
to have a free-until optimisation (and thus a form of distributed
scheduling.)
> The prohibition of interference in certain TT-states also led to my
> second mistake. I relaxed my second DES requirement and allowed
> interrupts when the UML instance is in the RUN-state.
I tend to think this was also another bug though, as I described in my
other email this can lead to events not even registering at the same
simulation time they were sent at, due to the async nature of SIGIO and
lack of polling when exiting WAIT state.
> This decision was based on the
> impression that UML was built to work this way without TT, so why
should it
> break when in TT-Mode (which you proved was wrong). Whether this is
> semantically reasonable or not can be questioned, but it triggered
technical
> problems with the current implementation.
>
> With this realization, I tend to agree that maybe the whole patches to
> ensure thread-safe (or reentrant-safe access) of the event list might
be
> dropped.
OK.
> Still, we should ensure that the SIGIO is simply processed
synchronously
> in the idle loop.
I think you mean something else, but as written, this is wrong. You
cannot "process SIGIO", but you can (otherwise) notice (poll for) the
event that caused SIGIO, or even get rid of the SIGIO entirely.
So I think you mean "all events are processed synchronously in or on
exit from the idle loop", and then I tend to agree, but that's a pretty
big departure from how interrupt processing works in UML normally.
> This aligns with my last DES constraint: since everything
> happens at the same moment in simulation time, we do not need "real"
> interrupts but can process interrupts (SIGIOs) later (but at the same
> simulation time).
(again, not SIGIO, but events on FDs)
Well, it's a trade-off. Yes we could do this, but the only advantage is
getting rid of the ACK messages, which are also needed for the
distributed scheduling model.
In a SystemC model I believe everything happens in a single process, so
sending an event basically can influence the scheduler already? Or
perhaps "scheduling" in that model doesn't actually just consist of
picking the next task from an existing list, but having each task check
for events first?
Either way, I'm not sure this works in the distributed model here.
> I think this approach only works in ext or cpu-inf mode and may be
> problematic in "normal" timetravel mode.
Yes, for sure, 'normal' time-travel is not suitable for this kind of
simulation, and vice versa.
> I might even consider dropping the signal handler,
> which marks the interrupts pending, and process the signals with a
> signalfd, but that is, again, only an optimization.
No, if you want to get away from the ACK requirement, you _have_ to drop
the signal handler, or at least make it do nothing, and understand that
events were queued in a different way, synchronously. The async nature
of the signal handlers causes all the problems we've been discussing -
unless, as I have done, you always wait for them to finish first with
the ACK requirement.
> Additionally, to address the interrupt acknowledgment for the serial
> line, I'd like to propose this: why not add an extra file descriptor
in the
> command line, which is something the kernel could write to, such as an
> eventfd or a pipe, to signal the acknowledgment of the interrupt. For
> example, the
> command line changes to ssl0=fd:0,fd:1,fd:3. If somebody uses the
serial
> line driver with timetravel mode but without that acknowledgment fd,
we
> can emit a warning or an error.
Sure, that works. But above you were arguing against the ACKs and now
you want to implement them? Which is it? ;-)
> I believe all these changes should work well with the shared memory
> optimization and should make the entire time travel ext protocol a bit
> more robust, easier to use, and harder to misuse. ;-)
I'm not sure which changes really are left?
Adding the ACKs to the serial port, sure, that makes sense to me.
Getting rid of the ACKs overall, dunno, maybe it's possible but it
requires a different scheduling model, where _any_ external event makes
the sender (which was running) give up its free-until without knowing
that was needed [2], and then the scheduler has to ask everyone to
process pending events before actually picking the next task to run.
This was a bit implicitly done by your scheduling request message, so
that at that point your UML had time to process events. I think it was
still racy though since it was still SIGIO based, which has no ordering
guarantee vs. the read from the TT socket.
[2] though you argue below it's always needed
> However, even after the lengthy discussion on "when" interrupts should
> be processed, I still hold the opinion that the response to raising an
> interrupt
> should be immediate to the device simulation and not delayed in
> simulation time. This only makes the device simulation harder without
> real benefit. If
> you want to delay the interrupt handling (ISR and so on), that's still
> possible and in both situations highly dependent on the UML
> implementation. If
> we want to add an interrupt delay, we need to implement something in
UML
> anyway. If you want to delay it in the device driver, you can also
always do
> it, but you are not at the mercy of some hard-to-determine extra delay
> from UML.
I think this is really completely orthogonal to the other discussion
apart from what I wrote at [2] above, but I don't necessarily think that
it always makes sense to process all events immediately, or to even
schedule for that. It doesn't even make the implementation harder to
request the time slot for handling the event at now + epsilon instead of
now. But anyway, no need to get into this.
> Overall, if you think that makes sense, I could start on some patches,
> or perhaps you feel more comfortable doing that.
I'm honestly not sure what you had in mind now, apart from the ack-fd on
serial?
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (2 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 03/11] um: Use a simple time travel handler for line interrupts Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:53 ` Johannes Berg
2023-11-10 18:41 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 05/11] um: Add final request time to TT wait message Benjamin Beichler
` (6 subsequent siblings)
10 siblings, 2 replies; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
The Timetravel socket can be read from the SIGIO handler, which may
catch a RUN message that was expected to be received in the wait loop.
It can happen that the device simulation only "created" one interrupt,
but an uncertain (i.e., hard to predict by the calendar/device
simulation) number of SIGIOs are emitted, with each SIGIO interrupt
producing a GET message. The calendar might try to send a RUN message
after the first GET message, which is then processed in the SIGIO
handler, waiting for the response to a subsequent GET message. However,
time is advanced by the received RUN message. Typically, this doesn't
pose problems because the interrupt implies that the next RUN message
should be at the current time (as sent by the calendar with the GET
message). But there are corner cases, such as simultaneous other
interrupts, which may desynchronize the current time in the UML instance
from the expected time from the calendar. Since there is no real use
case for advancing time in the SIGIO handler with RUN messages (in
contrast to UPDATE messages), we logically only expect time to advance
in the idle loop in TT-ext mode. Therefore, with this patch, we restrict
time changes to the idle loop.
Additionally, since both the idle loop and the signal/interrupt handlers
do blocking reads from the TT socket, a deadlock can occur if a RUN
message intended for the idle loop is received in the SIGIO handler. In
this situation, the calendar expects the UML instance to run, but it
actually waits for another message, either in the SIGIO handler (e.g.,
a second interrupt) or in a poll in the idle loop, as the previous
message was handled by the signal handler, which returned execution to
the main loop and ultimately entered the idle loop.
Therefore, this patch also allows checking whether the current RUN
message was handled by the idle loop or by the signal/interrupt handlers
in the ACK of the RUN.
With the information in the ACK of the RUN message, the calendar knows
whether the RUN was answered in a signal handler and can act
accordingly.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/time.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index fddd1dec27e6..ff5ef75bbb94 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -94,7 +94,10 @@ static void time_travel_handle_message(struct um_timetravel_msg *msg,
case UM_TIMETRAVEL_ACK:
return;
case UM_TIMETRAVEL_RUN:
- time_travel_set_time(msg->time);
+ // the caller time_travel_handle_message will do the appropriate time advance
+ // return here 0, if we got RUN, but are not in idle/ext_wait, to
+ // actually do anything with the run
+ resp.time = (mode != TTMH_READ) ? 1 : 0;
break;
case UM_TIMETRAVEL_FREE_UNTIL:
time_travel_ext_free_until_valid = true;
@@ -238,7 +241,7 @@ static void time_travel_ext_wait(bool idle)
*/
while (msg.op != UM_TIMETRAVEL_RUN)
time_travel_handle_message(&msg, idle ? TTMH_IDLE : TTMH_POLL);
-
+ time_travel_set_time(msg.time);
time_travel_ext_waiting--;
/* we might request more stuff while polling - reset when we run */
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK
2023-11-03 16:41 ` [PATCH RFC 04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK Benjamin Beichler
@ 2023-11-06 20:53 ` Johannes Berg
2023-11-10 18:41 ` Johannes Berg
1 sibling, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:53 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> The Timetravel socket can be read from the SIGIO handler, which may
> catch a RUN message that was expected to be received in the wait loop.
>
> It can happen that the device simulation only "created" one interrupt,
> but an uncertain (i.e., hard to predict by the calendar/device
> simulation) number of SIGIOs are emitted, with each SIGIO interrupt
> producing a GET message. The calendar might try to send a RUN message
> after the first GET message, which is then processed in the SIGIO
> handler, waiting for the response to a subsequent GET message. However,
> time is advanced by the received RUN message. Typically, this doesn't
> pose problems because the interrupt implies that the next RUN message
> should be at the current time (as sent by the calendar with the GET
> message). But there are corner cases, such as simultaneous other
> interrupts, which may desynchronize the current time in the UML instance
> from the expected time from the calendar. Since there is no real use
> case for advancing time in the SIGIO handler with RUN messages (in
> contrast to UPDATE messages), we logically only expect time to advance
> in the idle loop in TT-ext mode. Therefore, with this patch, we restrict
> time changes to the idle loop.
>
> Additionally, since both the idle loop and the signal/interrupt handlers
> do blocking reads from the TT socket, a deadlock can occur if a RUN
> message intended for the idle loop is received in the SIGIO handler. In
> this situation, the calendar expects the UML instance to run, but it
> actually waits for another message, either in the SIGIO handler (e.g.,
> a second interrupt) or in a poll in the idle loop, as the previous
> message was handled by the signal handler, which returned execution to
> the main loop and ultimately entered the idle loop.
>
> Therefore, this patch also allows checking whether the current RUN
> message was handled by the idle loop or by the signal/interrupt handlers
> in the ACK of the RUN.
>
> With the information in the ACK of the RUN message, the calendar knows
> whether the RUN was answered in a signal handler and can act
> accordingly.
>
This is going to take a bit more review cycles, tbh, still processing it
and probably need to sleep on it :)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK
2023-11-03 16:41 ` [PATCH RFC 04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK Benjamin Beichler
2023-11-06 20:53 ` Johannes Berg
@ 2023-11-10 18:41 ` Johannes Berg
1 sibling, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-10 18:41 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
So I guess now you reminded me of this thread ...
> The Timetravel socket can be read from the SIGIO handler, which may
> catch a RUN message that was expected to be received in the wait loop.
>
> It can happen that the device simulation only "created" one interrupt,
> but an uncertain (i.e., hard to predict by the calendar/device
> simulation) number of SIGIOs are emitted, with each SIGIO interrupt
> producing a GET message. The calendar might try to send a RUN message
> after the first GET message, which is then processed in the SIGIO
> handler, waiting for the response to a subsequent GET message. However,
> time is advanced by the received RUN message. Typically, this doesn't
> pose problems because the interrupt implies that the next RUN message
> should be at the current time (as sent by the calendar with the GET
> message). But there are corner cases, such as simultaneous other
> interrupts, which may desynchronize the current time in the UML instance
> from the expected time from the calendar. Since there is no real use
> case for advancing time in the SIGIO handler with RUN messages (in
> contrast to UPDATE messages), we logically only expect time to advance
> in the idle loop in TT-ext mode. Therefore, with this patch, we restrict
> time changes to the idle loop.
OK, that mostly makes sense, so far. Although IIUC now you lost the time
update, so you have to recover from that again somehow when you actually
do the RUN (which we don't do from the SIGIO handler).
> Additionally, since both the idle loop and the signal/interrupt handlers
> do blocking reads from the TT socket, a deadlock can occur if a RUN
> message intended for the idle loop is received in the SIGIO handler. In
> this situation, the calendar expects the UML instance to run, but it
> actually waits for another message, either in the SIGIO handler (e.g.,
> a second interrupt) or in a poll in the idle loop, as the previous
> message was handled by the signal handler, which returned execution to
> the main loop and ultimately entered the idle loop.
I'm not sure I understand this part. How can you get a RUN message when
someone else is still running and sending you interrupts? Or are you
sending interrupts (with time travel handlers because of patch 3?) from
_outside_ of the simulation?! I guess that'd be a case of "don't do that
then"? :)
Or ... I'm still wrapping my head around the system you built ... are
you saying that interrupts are actually delayed and we have something
like this:
Participant A --- message ----------------> UML (socket)
Participant A --- WAIT --> calendar
SIGIO ----> UML
calendar <-- GET --------------------- UML (time_travel_handler)
calendar --- RUN --------------------> UML
(reads "RUN" instead
of GET response)
But then that's again just because "you're doing it wrong", Participant
A should be waiting for an ACK message? At least that was the intent.
You're not using it that way with the line interrupts.
But honestly, I think that's just another reason not to do things the
way you're doing in patches 2 and 3.
> Therefore, this patch also allows checking whether the current RUN
> message was handled by the idle loop or by the signal/interrupt handlers
> in the ACK of the RUN.
>
> With the information in the ACK of the RUN message, the calendar knows
> whether the RUN was answered in a signal handler and can act
> accordingly.
TBH, I don't like this at all. If I'm not completely off with my
analysis above (or it's a similar situation), this just puts workaround
requirements on the calendar because you don't synchronise communication
in _other_ parts of the system properly.
In the simulation that we had intended to build, the exact timing of the
SIGIO wouldn't matter because UML would _always_ be in WAIT state when
getting SIGIO, and would always have to fully process the SIGIO (and
send an ACK on the respective socket) before the other side continues
and we can get to any other part of the system.
That's a pretty fundamental rule of the system because it allows us to
reason how information flows, which part can be doing what at which
time, and (mostly) ignore the inherently asynchronous nature of today's
computing in the system, since things are always locally serialised.
You broke that rule, and the system is coming apart at the seams. You
already found - as I pointed out in the other email - that you now have
to some assumptions on the sender side of the message about how the
interrupt processing will be done on the receiver (although that
assumption is perhaps not more than "may be immediate so I need to
return to the calendar").
You also found that when you broke that rule, you got other asynchronous
things going on, and now need to handle that suddenly you can get a RUN
message inside a SIGIO handler, because that's async and nothing waits
for it to have finished.
I honestly don't believe that removing this rule will do the system good
in the long term. You'll almost certainly find other corner cases - we
still find corner cases in the system _with_ the rule, and that's a
_much_ simpler system.
I also tend to think that the more complex system (without that rule)
will be much harder to optimise. For example, the shared memory table of
run requests (yes yes, I'll send it in a bit), you don't even have a GET
message in time_travel_add_irq_event() any more. In fact you have no
messages at all there! That maybe makes this part easier? But it also
means now you have to reason about why that's actually still OK to not
get the RUN message where you previously expected it, etc.
Or if we consider further optimisations we've been thinking about (but
not implemented yet) - one of them is to use signals for wait/run so we
can hand off running between processes without going through the
calendar (with the calendar still maintaining the new distributed
protocol on behalf of clients that don't participate in a distributed
protocol yet.) Not having the ACK would actually _defeat_ this
optimisation, because you wouldn't actually know who to hand it off to,
only the calendar would be able to do the handoff, since it's the only
one communicating with everyone. With the ACK you know that as soon as
you have the ACK, the shared table must be updated, and you can make the
hand-off decision locally instead of going to the calendar.
So ... I think all of this is an even longer-winded way than the last
discussion over on patch 3 of saying I don't like your simulation model
that doesn't require an ACK, and I don't think we should support it.
FWIW, it really shouldn't be that hard to replace the 'lines' with
virtio serial consoles or so. We've actually done that, we have a
device-side implementation of this:
#define VIRTIO_ID_RPROC_SERIAL 11 /* virtio remoteproc serial link */
(which is really just a virtio console, shows up as /dev/vport*p0 inside
the virtual system, rather than /dev/ssl* or whatever you have there
now.
With the usfstl vhost device abstraction [1] it's less than 150 LOC, and
much of it is boilerplate/struct filling...
[1] https://github.com/linux-test-project/usfstl/blob/main/include/usfstl/vhost.h
Although our implementation only supports a single connection and is
built into whatever generates the messages, if you want multiple UML
instances to communicate you'd have to have an implementation that has a
socket for each and a config file which is connected to where.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 05/11] um: Add final request time to TT wait message
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (3 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 04/11] um: Handle UM_TIMETRAVEL_RUN only in idle loop, signal success in ACK Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:29 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 06/11] um: always send UM_TIMETRAVEL_REQUEST from ISRs Benjamin Beichler
` (5 subsequent siblings)
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
Although the information for the next requested time was already sent in
a prior message, this change introduces the otherwise unused time
variable in the TT message. This addition enables an extra consistency
check and can otherwise be ignored.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/time.c | 6 +++---
include/uapi/linux/um_timetravel.h | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index ff5ef75bbb94..abd5fd0f62ee 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -220,7 +220,7 @@ static bool time_travel_ext_request(unsigned long long time)
return true;
}
-static void time_travel_ext_wait(bool idle)
+static void time_travel_ext_wait(bool idle, unsigned long long ns)
{
struct um_timetravel_msg msg = {
.op = UM_TIMETRAVEL_ACK,
@@ -230,7 +230,7 @@ static void time_travel_ext_wait(bool idle)
time_travel_ext_free_until_valid = false;
time_travel_ext_waiting++;
- time_travel_ext_req(UM_TIMETRAVEL_WAIT, -1);
+ time_travel_ext_req(UM_TIMETRAVEL_WAIT, ns);
/*
* Here we are deep in the idle loop, so we have to break out of the
@@ -256,7 +256,7 @@ static void time_travel_ext_get_time(void)
static void __time_travel_update_time(unsigned long long ns, bool idle)
{
if (time_travel_mode == TT_MODE_EXTERNAL && time_travel_ext_request(ns))
- time_travel_ext_wait(idle);
+ time_travel_ext_wait(idle, ns);
else
time_travel_set_time(ns);
}
diff --git a/include/uapi/linux/um_timetravel.h b/include/uapi/linux/um_timetravel.h
index ca3238222b6d..3127f069d9dc 100644
--- a/include/uapi/linux/um_timetravel.h
+++ b/include/uapi/linux/um_timetravel.h
@@ -77,9 +77,9 @@ enum um_timetravel_ops {
/**
* @UM_TIMETRAVEL_WAIT: Indicate waiting for the previously requested
* runtime, new requests may be made while waiting (e.g. due to
- * interrupts); the time field is ignored. The calendar must process
- * this message and later send a %UM_TIMETRAVEL_RUN message when
- * the host can run again.
+ * interrupts); the time field contains the next requested runtime
+ * for consistency checks. The calendar must process this message and
+ * later send a %UM_TIMETRAVEL_RUN message when the host can run again.
* (host -> calendar)
*/
UM_TIMETRAVEL_WAIT = 3,
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 05/11] um: Add final request time to TT wait message
2023-11-03 16:41 ` [PATCH RFC 05/11] um: Add final request time to TT wait message Benjamin Beichler
@ 2023-11-06 20:29 ` Johannes Berg
0 siblings, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:29 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> Although the information for the next requested time was already sent in
> a prior message, this change introduces the otherwise unused time
> variable in the TT message. This addition enables an extra consistency
> check and can otherwise be ignored.
Seems fine to me.
> --- a/include/uapi/linux/um_timetravel.h
> +++ b/include/uapi/linux/um_timetravel.h
> @@ -77,9 +77,9 @@ enum um_timetravel_ops {
> /**
> * @UM_TIMETRAVEL_WAIT: Indicate waiting for the previously requested
> * runtime, new requests may be made while waiting (e.g. due to
> - * interrupts); the time field is ignored. The calendar must process
> - * this message and later send a %UM_TIMETRAVEL_RUN message when
> - * the host can run again.
> + * interrupts); the time field contains the next requested runtime
> + * for consistency checks. The calendar must process this message and
> + * later send a %UM_TIMETRAVEL_RUN message when the host can run again.
Indentation got messed up here though.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 06/11] um: always send UM_TIMETRAVEL_REQUEST from ISRs
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (4 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 05/11] um: Add final request time to TT wait message Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:32 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 07/11] um: add TIMETRAVEL_REQUEST handler to request latest event Benjamin Beichler
` (4 subsequent siblings)
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
The number of UM_TIMETRAVEL_REQUEST msgs is tried to reduced, if they
are redundant via the time_travel_ext_prev_request_valid flag. This can
create a race condition, when an interrupt happens and the idle loop
wants to wait.
This means, sometimes the UM_TIMETRAVEL_REQUEST are sent(when the idle
loop already started waiting) and sometimes not, which makes it harder
to implement the other end of the scheduler. Therefore, this fix make
the time travel protocol a bit more deterministic by always sending the
UM_TIMETRAVEL_REQUEST msg.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/time.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index abd5fd0f62ee..54fc4a69cb59 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -446,6 +446,7 @@ void time_travel_add_irq_event(struct time_travel_event *e)
BUG_ON(time_travel_mode != TT_MODE_EXTERNAL);
time_travel_ext_get_time();
+ time_travel_ext_prev_request_valid = false;
/*
* We could model interrupt latency here, for now just
* don't have any latency at all and request the exact
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 06/11] um: always send UM_TIMETRAVEL_REQUEST from ISRs
2023-11-03 16:41 ` [PATCH RFC 06/11] um: always send UM_TIMETRAVEL_REQUEST from ISRs Benjamin Beichler
@ 2023-11-06 20:32 ` Johannes Berg
0 siblings, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:32 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> The number of UM_TIMETRAVEL_REQUEST msgs is tried to reduced, if they
> are redundant via the time_travel_ext_prev_request_valid flag.
>
This ... doesn't parse so well. I think I get what you're trying to say,
but it's a bit awkward?
Maybe say "The code attempts to reduce the number of ... messages if
they are redundant. This is achieved with the ... flag." or something
like that?
At least that's what I think what you're trying to say :)
> This can
> create a race condition, when an interrupt happens and the idle loop
> wants to wait.
>
> This means, sometimes the UM_TIMETRAVEL_REQUEST are sent(when the idle
> loop already started waiting) and sometimes not, which makes it harder
> to implement the other end of the scheduler.
Not sure I get that though, why does it matter? Just keep track of the
last request?
Do you have some kind of (calendar) logs that would show the different
message behaviour?
FWIW, we've also got work pending for a while now that we should submit
that implements scheduling via shared memory for the most part, so that
all these REQUEST and time update messages just go away _entirely_,
which helps a lot for performance too.
> +++ b/arch/um/kernel/time.c
> @@ -446,6 +446,7 @@ void time_travel_add_irq_event(struct time_travel_event *e)
> BUG_ON(time_travel_mode != TT_MODE_EXTERNAL);
>
> time_travel_ext_get_time();
> + time_travel_ext_prev_request_valid = false;
>
At the very least I think there should be a comment here, but I'm not
really convinced yet why this makes sense :)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 07/11] um: add TIMETRAVEL_REQUEST handler to request latest event
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (5 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 06/11] um: always send UM_TIMETRAVEL_REQUEST from ISRs Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:33 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 08/11] um: Protect accesses to the timetravel event list Benjamin Beichler
` (3 subsequent siblings)
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
Allow the so called Calendar (or simulation master) to request the
latest timetravel event. This can be used in situations, where the
former request message from UML to Calendar was missed.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/time.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index 54fc4a69cb59..e513256eadfe 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -41,6 +41,7 @@ static bool time_travel_ext_prev_request_valid;
static unsigned long long time_travel_ext_prev_request;
static bool time_travel_ext_free_until_valid;
static unsigned long long time_travel_ext_free_until;
+static struct time_travel_event *time_travel_first_event(void);
static void time_travel_set_time(unsigned long long ns)
{
@@ -66,6 +67,7 @@ static void time_travel_handle_message(struct um_timetravel_msg *msg,
.op = UM_TIMETRAVEL_ACK,
};
int ret;
+ struct time_travel_event *event;
/*
* We can't unlock here, but interrupt signals with a timetravel_handler
@@ -103,6 +105,10 @@ static void time_travel_handle_message(struct um_timetravel_msg *msg,
time_travel_ext_free_until_valid = true;
time_travel_ext_free_until = msg->time;
break;
+ case UM_TIMETRAVEL_REQUEST:
+ event = time_travel_first_event();
+ resp.time = event->time;
+ break;
}
resp.seq = msg->seq;
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 07/11] um: add TIMETRAVEL_REQUEST handler to request latest event
2023-11-03 16:41 ` [PATCH RFC 07/11] um: add TIMETRAVEL_REQUEST handler to request latest event Benjamin Beichler
@ 2023-11-06 20:33 ` Johannes Berg
2023-11-10 16:23 ` Benjamin Beichler
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:33 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> Allow the so called Calendar (or simulation master) to request the
> latest timetravel event. This can be used in situations, where the
> former request message from UML to Calendar was missed.
How ... is it possible to _miss_ a message?! It must've even sent an ACK
to it...?
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 07/11] um: add TIMETRAVEL_REQUEST handler to request latest event
2023-11-06 20:33 ` Johannes Berg
@ 2023-11-10 16:23 ` Benjamin Beichler
2023-11-10 17:19 ` Johannes Berg
0 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-10 16:23 UTC (permalink / raw)
To: Johannes Berg, Richard Weinberger, Anton Ivanov; +Cc: linux-um
Am 06.11.2023 um 21:33 schrieb Johannes Berg:
> ________________________________
>
> On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
>> Allow the so called Calendar (or simulation master) to request the
>> latest timetravel event. This can be used in situations, where the
>> former request message from UML to Calendar was missed.
> How ... is it possible to _miss_ a message?! It must've even sent an ACK
> to it...?
I must admit, this patch is older than those for protecting the event
list. I think mostly I mean here, the UM-instance miss to send the
current request, not that the calendar missed it :-)
We were able to trigger that situation often, so I made some call to
make a check what the current state in the instance is.
Benjamin
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH RFC 07/11] um: add TIMETRAVEL_REQUEST handler to request latest event
2023-11-10 16:23 ` Benjamin Beichler
@ 2023-11-10 17:19 ` Johannes Berg
0 siblings, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-10 17:19 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-10 at 17:23 +0100, Benjamin Beichler wrote:
> Am 06.11.2023 um 21:33 schrieb Johannes Berg:
> > ________________________________
> >
> > On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> > > Allow the so called Calendar (or simulation master) to request the
> > > latest timetravel event. This can be used in situations, where the
> > > former request message from UML to Calendar was missed.
> > How ... is it possible to _miss_ a message?! It must've even sent an ACK
> > to it...?
> I must admit, this patch is older than those for protecting the event
> list. I think mostly I mean here, the UM-instance miss to send the
> current request, not that the calendar missed it :-)
Oh. I think we should rather fix that bug though. I don't think we've
observed it.
I'm not entirely opposed to this, but relying on it for correctness
seems wrong.
Also, I've referred to it before, but with the shared memory the whole
thing kind of goes away anyway since we just write the (earliest)
request into the shared memory.
Maybe if not many kids come singing tonight I'll try to send out some
more patches ;)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 08/11] um: Protect accesses to the timetravel event list
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (6 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 07/11] um: add TIMETRAVEL_REQUEST handler to request latest event Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:37 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 09/11] um: Delay timer_read in time travel mode only after consecutive reads Benjamin Beichler
` (2 subsequent siblings)
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
When the timetravel ext modus is used, accessing the timetravel event
list can be interrupted by a message on the timetravel socket in the
SIGIO signal handler. This interruption can potentially modify the event
list, leading to race conditions that cause deadlocks in the timetravel
protocol or disrupt the ordered nature of the list, triggering bugs.
Previously, the normal irq-save function did not intentionally block the
timetravel handlers. However, the additional (un)block_signals_hard
functions do block them. Therefore, these functions have been added at
the appropriate places to address the issue.
It's worth noting that although the functions are named as blocking,
they primarily defer the actual execution of the SIGIO handlers until
the unblock call. If no signal was issued, this mainly results in a
variable assignment and a memory barrier.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/time.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index e513256eadfe..f1b2ca45994d 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -288,6 +288,7 @@ static void __time_travel_add_event(struct time_travel_event *e,
e->time = time;
local_irq_save(flags);
+ block_signals_hard();
list_for_each_entry(tmp, &time_travel_events, list) {
/*
* Add the new entry before one with higher time,
@@ -310,6 +311,7 @@ static void __time_travel_add_event(struct time_travel_event *e,
tmp = time_travel_first_event();
time_travel_ext_update_request(tmp->time);
time_travel_next_event = tmp->time;
+ unblock_signals_hard();
local_irq_restore(flags);
}
@@ -349,6 +351,7 @@ void deliver_time_travel_irqs(void)
return;
local_irq_save(flags);
+ block_signals_hard();
irq_enter();
while ((e = list_first_entry_or_null(&time_travel_irqs,
struct time_travel_event,
@@ -358,6 +361,7 @@ void deliver_time_travel_irqs(void)
e->fn(e);
}
irq_exit();
+ unblock_signals_hard();
local_irq_restore(flags);
}
@@ -370,7 +374,9 @@ static void time_travel_deliver_event(struct time_travel_event *e)
*/
e->fn(e);
} else if (irqs_disabled()) {
+ block_signals_hard();
list_add_tail(&e->list, &time_travel_irqs);
+ unblock_signals_hard();
/*
* set pending again, it was set to false when the
* event was deleted from the original list, but
@@ -395,8 +401,10 @@ bool time_travel_del_event(struct time_travel_event *e)
if (!e->pending)
return false;
local_irq_save(flags);
+ block_signals_hard();
list_del(&e->list);
e->pending = false;
+ unblock_signals_hard();
local_irq_restore(flags);
return true;
}
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 08/11] um: Protect accesses to the timetravel event list
2023-11-03 16:41 ` [PATCH RFC 08/11] um: Protect accesses to the timetravel event list Benjamin Beichler
@ 2023-11-06 20:37 ` Johannes Berg
0 siblings, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:37 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> When the timetravel ext modus is used, accessing the timetravel event
> list can be interrupted by a message on the timetravel socket in the
> SIGIO signal handler. This interruption can potentially modify the event
> list, leading to race conditions that cause deadlocks in the timetravel
> protocol or disrupt the ordered nature of the list, triggering bugs.
>
> Previously, the normal irq-save function did not intentionally block the
> timetravel handlers. However, the additional (un)block_signals_hard
> functions do block them. Therefore, these functions have been added at
> the appropriate places to address the issue.
>
> It's worth noting that although the functions are named as blocking,
> they primarily defer the actual execution of the SIGIO handlers until
> the unblock call. If no signal was issued, this mainly results in a
> variable assignment and a memory barrier.
Ahh ... _now_ I'm starting to understand what you meant earlier wrt.
signals and blocking and it not doing anything ...
> local_irq_save(flags);
> + block_signals_hard();
> list_for_each_entry(tmp, &time_travel_events, list) {
Yeah, indeed, signals are interrupts and we process them at a lower
level, so here our abstraction of disabling interrupts is leaky because
the signal can still happen to modify the list due to the
time_travel_add_irq_event() ...
Note I also have a race fix for signals_blocked_pending somewhere...
Need to find time to flush out more of our patches.
But yeah, I guess this makes sense.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 09/11] um: Delay timer_read in time travel mode only after consecutive reads
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (7 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 08/11] um: Protect accesses to the timetravel event list Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:40 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode Benjamin Beichler
2023-11-03 16:41 ` [PATCH RFC 11/11] um: Remove all TSC flags when using Time Travel Mode Benjamin Beichler
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
Given the presence of numerous timer_read calls in well-behaved code
within the kernel and userspace, particularly those that do not employ
busy loops, we introduce a delay in reading the timer only after several
consecutive attempts (currently set at 10).
Unfortunately, it is challenging to differentiate between various
callers and pinpoint specific misbehaving code, so this is only a
mediocre heuristic.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/time.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index f1b2ca45994d..f76237cfc1ea 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -723,6 +723,9 @@ static irqreturn_t um_timer(int irq, void *dev)
static u64 timer_read(struct clocksource *cs)
{
+ static unsigned long long last_timer_read;
+ static int consecutive_calls_at_same_time;
+
if (time_travel_mode != TT_MODE_OFF) {
/*
* We make reading the timer cost a bit so that we don't get
@@ -736,8 +739,14 @@ static u64 timer_read(struct clocksource *cs)
* "what do I do next" and onstack event we use to know when
* to return from time_travel_update_time().
*/
+ if (last_timer_read != time_travel_time) {
+ last_timer_read = time_travel_time;
+ consecutive_calls_at_same_time = 0;
+ } else {
+ consecutive_calls_at_same_time++;
+ }
if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
- !time_travel_ext_waiting)
+ !time_travel_ext_waiting && consecutive_calls_at_same_time > 10)
time_travel_update_time(time_travel_time +
TIMER_MULTIPLIER,
false);
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 09/11] um: Delay timer_read in time travel mode only after consecutive reads
2023-11-03 16:41 ` [PATCH RFC 09/11] um: Delay timer_read in time travel mode only after consecutive reads Benjamin Beichler
@ 2023-11-06 20:40 ` Johannes Berg
0 siblings, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:40 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> Given the presence of numerous timer_read calls in well-behaved code
> within the kernel and userspace, particularly those that do not employ
> busy loops, we introduce a delay in reading the timer only after several
> consecutive attempts (currently set at 10).
>
> Unfortunately, it is challenging to differentiate between various
> callers and pinpoint specific misbehaving code, so this is only a
> mediocre heuristic.
Could use with some more comments in the code I guess. :)
Generally I'd say what we have now isn't a _problem_, but in a sense if
you implement infinite CPU speed ... why does reading the time take
time? Not that I think it's _wrong_, and arguably you always should've
expected reading time to take time, but ...
Arguably though this should be squashed with the next patch since you
restrict it there?
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (8 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 09/11] um: Delay timer_read in time travel mode only after consecutive reads Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-06 20:51 ` Johannes Berg
2023-11-03 16:41 ` [PATCH RFC 11/11] um: Remove all TSC flags when using Time Travel Mode Benjamin Beichler
10 siblings, 1 reply; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
Some userspace programs use the current timestamp as a looping condition
(directly or indirectly), which can lead to infinite loops in TT-mode
since the timestamp doesn't change until the next time step.
Commit 065038706f77 ("um: Support time travel mode") introduced a
workaround by always inserting a time step when the current timestamp is
read. However, this introduces delays, even in cases where such a loop
doesn't exist, for example, in filesystem code that updates access
timestamps.
This slows down external TT-mode as more simulation roundtrips are
required, and it unnecessarily affects the determinism and accuracy of
the simulation.
This patch attempts to detect potential busy loops by identifying the
currently executed syscall that would return a timestamp. If 10
consecutive calls at the same timestamp occur, the timer_read is
delayed.
Since the patch does not consider the PID of the caller, it might be
fooled by legitimate processes that take a single timestamp. However,
the result is no worse than the original behavior.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/time.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/arch/um/kernel/time.c b/arch/um/kernel/time.c
index f76237cfc1ea..1782f1ed140e 100644
--- a/arch/um/kernel/time.c
+++ b/arch/um/kernel/time.c
@@ -22,6 +22,7 @@
#include <linux/time-internal.h>
#include <linux/um_timetravel.h>
#include <shared/init.h>
+#include <asm/syscall-generic.h>
#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
enum time_travel_mode time_travel_mode;
@@ -721,6 +722,27 @@ static irqreturn_t um_timer(int irq, void *dev)
return IRQ_HANDLED;
}
+static const int suspicious_busy_loop_syscalls[] = {
+ 36, //sys_getitimer
+ 96, //sys_gettimeofday
+ 201, //sys_time
+ 224, //sys_timer_gettime
+ 228, //sys_clock_gettime
+ 287, //sys_timerfd_gettime
+};
+
+static bool suspicious_busy_loop(void)
+{
+ int i;
+ int syscall = syscall_get_nr(current, task_pt_regs(current));
+
+ for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
+ if (suspicious_busy_loop_syscalls[i] == syscall)
+ return true;
+ }
+ return false;
+};
+
static u64 timer_read(struct clocksource *cs)
{
static unsigned long long last_timer_read;
@@ -742,11 +764,12 @@ static u64 timer_read(struct clocksource *cs)
if (last_timer_read != time_travel_time) {
last_timer_read = time_travel_time;
consecutive_calls_at_same_time = 0;
- } else {
+ } else if (suspicious_busy_loop()) {
consecutive_calls_at_same_time++;
}
if (!irqs_disabled() && !in_interrupt() && !in_softirq() &&
- !time_travel_ext_waiting && consecutive_calls_at_same_time > 10)
+ !time_travel_ext_waiting && consecutive_calls_at_same_time > 10 &&
+ suspicious_busy_loop())
time_travel_update_time(time_travel_time +
TIMER_MULTIPLIER,
false);
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode
2023-11-03 16:41 ` [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode Benjamin Beichler
@ 2023-11-06 20:51 ` Johannes Berg
2023-11-10 15:54 ` Benjamin Beichler
0 siblings, 1 reply; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:51 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> This slows down external TT-mode as more simulation roundtrips are
> required, and it unnecessarily affects the determinism and accuracy of
> the simulation.
I still don't think this is really true, it doesn't really affect
determinism? It makes it ... different, sure, but not non-deterministic?
> +static const int suspicious_busy_loop_syscalls[] = {
> + 36, //sys_getitimer
> + 96, //sys_gettimeofday
> + 201, //sys_time
> + 224, //sys_timer_gettime
> + 228, //sys_clock_gettime
> + 287, //sys_timerfd_gettime
> +};
That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
least?
> +static bool suspicious_busy_loop(void)
> +{
> + int i;
> + int syscall = syscall_get_nr(current, task_pt_regs(current));
> +
> + for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
> + if (suspicious_busy_loop_syscalls[i] == syscall)
> + return true;
> + }
Might also be faster to have a bitmap? But ... also kind of awkward I
guess.
I dunno. I'm not even sure what you're trying to achieve - apart from
"determinism" which seems odd or even wrong, and speed, which is
probably easier done with a better free-until and the shared memory
calendar we have been working on.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode
2023-11-06 20:51 ` Johannes Berg
@ 2023-11-10 15:54 ` Benjamin Beichler
2023-11-10 16:39 ` Benjamin Berg
2023-11-10 17:29 ` Johannes Berg
0 siblings, 2 replies; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-10 15:54 UTC (permalink / raw)
To: Johannes Berg, Richard Weinberger, Anton Ivanov; +Cc: linux-um
Am 06.11.2023 um 21:51 schrieb Johannes Berg:
> On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
>> This slows down external TT-mode as more simulation roundtrips are
>> required, and it unnecessarily affects the determinism and accuracy of
>> the simulation.
> I still don't think this is really true, it doesn't really affect
> determinism? It makes it ... different, sure, but not non-deterministic?
I intentionally kept it vague, but what I meant is that it's
unnecessarily challenging to determine.
Perhaps I should mention that I'm running an unmodified Ubuntu rootfs
with systemd, which starts many daemons and other processes.
To me, it seems illogical to delay everything just because one process
is waiting for a timestamp.
At the moment, we haven't patched the random device that fetches random
bytes from the host (do you already have a patch for this?),
so complete repeatability isn't guaranteed at the moment. However, that
could be a logical next step.
>> +static const int suspicious_busy_loop_syscalls[] = {
>> + 36, //sys_getitimer
>> + 96, //sys_gettimeofday
>> + 201, //sys_time
>> + 224, //sys_timer_gettime
>> + 228, //sys_clock_gettime
>> + 287, //sys_timerfd_gettime
>> +};
> That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
> least?
Actually, this was a quick attempt to address the issue, and during that
period, I couldn't locate the appropriate macros.
These numbers are generated from arch/x86/entry/syscalls/syscall_64.tbl
(or 32 if configured in that manner).
I might be overlooking something, but it seems that __NR_timer_gettime
isn't defined in the kernel. If you have a better reference for this
translation, I'd appreciate it.
I could verify if the current syscall translates into the corresponding
function symbol in the um-syscall table.
>> +static bool suspicious_busy_loop(void)
>> +{
>> + int i;
>> + int syscall = syscall_get_nr(current, task_pt_regs(current));
>> +
>> + for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
>> + if (suspicious_busy_loop_syscalls[i] == syscall)
>> + return true;
>> + }
> Might also be faster to have a bitmap? But ... also kind of awkward I
> guess.
Actually, a short fixed size array should be optimized quite well with
loop unrolling or other stuff, isn't it? I could also do a switch with
all calls, but this loop seems for me the easiest.
> I dunno. I'm not even sure what you're trying to achieve - apart from
> "determinism" which seems odd or even wrong, and speed, which is
> probably easier done with a better free-until and the shared memory
> calendar we have been working on.
In my perspective, delaying get_timer only serves as a tie-breaker for
poorly behaving software that resorts to a busy-loop while waiting for
time to advance.
While this behavior might not be uncommon, why penalize all processes
for it?
Consider an experiment where I aim to measure the impact of network
latency on software. Sometimes, the response latency fluctuates
because a background task was scheduled randomly but deterministically
at the same time, obtaining a timestamp that blocks all other
processes and advances simulation time. This action simply undermines
the utility of the time travel mode unnecessarily.
However, software actively waiting for time advancement in a busy loop
achieves its goal. It’s almost a win-win situation, isn't it?
Benjamin
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode
2023-11-10 15:54 ` Benjamin Beichler
@ 2023-11-10 16:39 ` Benjamin Berg
2023-11-10 17:29 ` Johannes Berg
1 sibling, 0 replies; 36+ messages in thread
From: Benjamin Berg @ 2023-11-10 16:39 UTC (permalink / raw)
To: Benjamin Beichler, Johannes Berg, Richard Weinberger,
Anton Ivanov; +Cc: linux-um
[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]
On Fri, 2023-11-10 at 16:54 +0100, Benjamin Beichler wrote:
> At the moment, we haven't patched the random device that fetches random
> bytes from the host (do you already have a patch for this?),
> so complete repeatability isn't guaranteed at the moment. However, that
> could be a logical next step.
Right, we have the attached kernel patches internally. This simply
disables some of the random sources and replaces os_getrandom with
returning static random from the UML_RANDOM environment variable.
I doubt that it makes sense to upstream these patches, but may we can
include them as patch files in USFSTL or so.
The second piece is using a mount namespace to ensure that the linux
command line is identical between runs and that the location of all
files that are accessed directly from the host through hostfs never
changes.
The last piece was setting GLIBC_TUNABLES=-AVX512CD in the environment
just in case the CPU feature set is slightly different. That would
cause ld.so to search for a different set of optimized library versions
(affecting syscalls and with that randomness).
Benjamin
[-- Attachment #2: 0001-um-Use-fixed-random-seed-if-UML_RANDOM-is-set.patch --]
[-- Type: text/x-patch, Size: 1758 bytes --]
From 0b51202872111f1a5f7a59435ff741ef0272d30f Mon Sep 17 00:00:00 2001
From: Benjamin Berg <benjamin.berg@intel.com>
Date: Thu, 16 Mar 2023 13:19:37 +0200
Subject: [PATCH 1/3] um: Use fixed random seed if UML_RANDOM is set
This helps with reproducable test runs.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/os-Linux/util.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/arch/um/os-Linux/util.c b/arch/um/os-Linux/util.c
index fc0f2a9dee5a..1e7b19272dfa 100644
--- a/arch/um/os-Linux/util.c
+++ b/arch/um/os-Linux/util.c
@@ -99,6 +99,42 @@ static inline void __attribute__ ((noreturn)) uml_abort(void)
ssize_t os_getrandom(void *buf, size_t len, unsigned int flags)
{
+ static char random_pattern[128];
+ static ssize_t random_pattern_len = -1;
+
+ /* This happens when called by setup_arch */
+ if (random_pattern_len == -1) {
+ const char *env;
+
+ env = getenv("UML_RANDOM");
+ if (env) {
+ random_pattern_len =
+ strlen(env) > sizeof(random_pattern) ?
+ sizeof(random_pattern) : strlen(env);
+ memcpy(random_pattern, env, random_pattern_len);
+ } else {
+ random_pattern_len = 0;
+ }
+ }
+
+ if (random_pattern_len > 0) {
+ size_t tail = len;
+ /*
+ * If the returned length is too short, then the kernel might
+ * loop trying to generate random from the passing of time.
+ * Which seems to possibly infinite loop in time-travel mode.
+ * So just repeat the given pattern.
+ */
+ while (tail > random_pattern_len) {
+ memcpy(buf, random_pattern, random_pattern_len);
+ tail -= random_pattern_len;
+ buf += random_pattern_len;
+ }
+ memcpy(buf, random_pattern, tail);
+
+ return len;
+ }
+
return getrandom(buf, len, flags);
}
--
2.41.0
[-- Attachment #3: 0002-random-disable-interrupt-random-source.patch --]
[-- Type: text/x-patch, Size: 1057 bytes --]
From b8aa74aed9fa008b908682bfa085461635e876e0 Mon Sep 17 00:00:00 2001
From: Benjamin Berg <benjamin.berg@intel.com>
Date: Thu, 16 Mar 2023 13:21:19 +0200
Subject: [PATCH 2/3] random: disable interrupt random source
Interrupts in our UML environment are signals, which can be delivered at
somewhat random times depending on host scheduling. Disable these as a
source to make the random reproducible between runs.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
drivers/char/random.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 253f2ddb8913..db2c5a296c05 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1093,6 +1093,9 @@ void add_interrupt_randomness(int irq)
struct pt_regs *regs = get_irq_regs();
unsigned int new_count;
+ /* UML seems to not always get the signals entirely at the same time */
+ return;
+
fast_mix(fast_pool->pool, entropy,
(regs ? instruction_pointer(regs) : _RET_IP_) ^ swab(irq));
new_count = ++fast_pool->count;
--
2.41.0
[-- Attachment #4: 0003-random-do-not-include-utsname-in-early-random.patch --]
[-- Type: text/x-patch, Size: 1021 bytes --]
From c62e671073cfe990fd2200c81defae98b1e9258b Mon Sep 17 00:00:00 2001
From: Benjamin Berg <benjamin.berg@intel.com>
Date: Wed, 22 Mar 2023 17:56:37 +0100
Subject: [PATCH 3/3] random: do not include utsname in early random
The uts name includes information about the compile time and such and
changes. Exclude it, so that different kernel compilations will see
the same random numbers.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
drivers/char/random.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index db2c5a296c05..dd4aa42a4404 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -865,7 +865,7 @@ void __init random_init_early(const char *command_line)
++i;
}
- _mix_pool_bytes(init_utsname(), sizeof(*(init_utsname())));
+ /* _mix_pool_bytes(init_utsname(), sizeof(*(init_utsname()))); */
_mix_pool_bytes(command_line, strlen(command_line));
/* Reseed if already seeded by earlier phases. */
--
2.41.0
[-- Attachment #5: Type: text/plain, Size: 152 bytes --]
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode
2023-11-10 15:54 ` Benjamin Beichler
2023-11-10 16:39 ` Benjamin Berg
@ 2023-11-10 17:29 ` Johannes Berg
1 sibling, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-10 17:29 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-10 at 16:54 +0100, Benjamin Beichler wrote:
> Am 06.11.2023 um 21:51 schrieb Johannes Berg:
> > On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> > > This slows down external TT-mode as more simulation roundtrips are
> > > required, and it unnecessarily affects the determinism and accuracy of
> > > the simulation.
> > I still don't think this is really true, it doesn't really affect
> > determinism? It makes it ... different, sure, but not non-deterministic?
> I intentionally kept it vague, but what I meant is that it's
> unnecessarily challenging to determine.
Yeah, ok, fair enough.
> Perhaps I should mention that I'm running an unmodified Ubuntu rootfs
> with systemd, which starts many daemons and other processes.
That sounds like a bit of a nightmare, to be honest, wouldn't you want
to keep things under tighter control? But I guess it really depends on
what you're trying to achieve.
> To me, it seems illogical to delay everything just because one process
> is waiting for a timestamp.
Yeah I guess we'll just have to disagree ;-) You're running some
process, so you've kind of decided to "give it time" of sorts, and in a
normal system reading time will always take time, just like everything
else :)
But anyway, I'm not really opposed to this patch, it's just ... not
great, I guess? And like I said, makes more sense to squash 9 and 10?
> At the moment, we haven't patched the random device that fetches random
> bytes from the host (do you already have a patch for this?),
> so complete repeatability isn't guaranteed at the moment. However, that
> could be a logical next step.
> > > +static const int suspicious_busy_loop_syscalls[] = {
> > > + 36, //sys_getitimer
> > > + 96, //sys_gettimeofday
> > > + 201, //sys_time
> > > + 224, //sys_timer_gettime
> > > + 228, //sys_clock_gettime
> > > + 287, //sys_timerfd_gettime
> > > +};
> > That's kind of awful. Surely we can use __NR_timer_gettime etc. here at
> > least?
> Actually, this was a quick attempt to address the issue, and during that
> period, I couldn't locate the appropriate macros.
>
> These numbers are generated from arch/x86/entry/syscalls/syscall_64.tbl
> (or 32 if configured in that manner).
>
> I might be overlooking something, but it seems that __NR_timer_gettime
> isn't defined in the kernel. If you have a better reference for this
> translation, I'd appreciate it.
Look at the arch/x86/include/generated/uapi/asm/unistd*.h files after
you build the tree. How do they actually get generated? Beats me.
> > > +static bool suspicious_busy_loop(void)
> > > +{
> > > + int i;
> > > + int syscall = syscall_get_nr(current, task_pt_regs(current));
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(suspicious_busy_loop_syscalls); i++) {
> > > + if (suspicious_busy_loop_syscalls[i] == syscall)
> > > + return true;
> > > + }
> > Might also be faster to have a bitmap? But ... also kind of awkward I
> > guess.
> Actually, a short fixed size array should be optimized quite well with
> loop unrolling or other stuff, isn't it? I could also do a switch with
> all calls, but this loop seems for me the easiest.
Yeah, maybe. I haven't checked the output.
> > I dunno. I'm not even sure what you're trying to achieve - apart from
> > "determinism" which seems odd or even wrong, and speed, which is
> > probably easier done with a better free-until and the shared memory
> > calendar we have been working on.
> In my perspective, delaying get_timer only serves as a tie-breaker for
> poorly behaving software that resorts to a busy-loop while waiting for
> time to advance.
Yeah I guess that's true.
> While this behavior might not be uncommon, why penalize all processes
> for it?
Well I think we have a different sense of "penalize", I wouldn't say
that. I mean, you can't reasonably expect getting a timestamp doesn't
take any time at all, that's just not how physical reality works? Now
we're bending the rules here in that a lot of things that normally take
time suddenly don't, but I guess I don't fully understand why you're so
keen on bending the rules _all the way_.
But I think that's this:
> Consider an experiment where I aim to measure the impact of network
> latency on software. Sometimes, the response latency fluctuates
> because a background task was scheduled randomly but deterministically
"randomly but deterministically" is kind of fun 😂️
> at the same time, obtaining a timestamp that blocks all other
> processes and advances simulation time. This action simply undermines
> the utility of the time travel mode unnecessarily.
>
> However, software actively waiting for time advancement in a busy loop
> achieves its goal. It’s almost a win-win situation, isn't it?
Fair enough.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH RFC 11/11] um: Remove all TSC flags when using Time Travel Mode
2023-11-03 16:41 [PATCH RFC 00/11] Several Time Travel Mode Enhancements Benjamin Beichler
` (9 preceding siblings ...)
2023-11-03 16:41 ` [PATCH RFC 10/11] um: Delay timer_read only in possible busy loops in TT-mode Benjamin Beichler
@ 2023-11-03 16:41 ` Benjamin Beichler
2023-11-03 18:45 ` Anton Ivanov
2023-11-06 20:52 ` Johannes Berg
10 siblings, 2 replies; 36+ messages in thread
From: Benjamin Beichler @ 2023-11-03 16:41 UTC (permalink / raw)
To: Richard Weinberger, Anton Ivanov, Johannes Berg
Cc: linux-um, Benjamin Beichler
In time travel mode, the internal time is only advanced at discrete
synchronization points, while the TSC register continuously increases.
Although the TSC registers remain accessible, we remove the TSC flags to
indicate to programs that properly check the TSC flag before reading
that the CPU lacks a proper TSC register.
This change is primarily cosmetic but may encourage some programs to use
gettimeofday and similar calls for time measurement.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
---
arch/um/kernel/um_arch.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
index b1bfed0c8528..6bfd531b5f87 100644
--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -30,6 +30,7 @@
#include <kern_util.h>
#include <mem_user.h>
#include <os.h>
+#include <timetravel.h>
#include "um_arch.h"
@@ -284,7 +285,8 @@ static void parse_host_cpu_flags(char *line)
{
int i;
for (i = 0; i < 32*NCAPINTS; i++) {
- if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]))
+ if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]) &&
+ !(time_travel_mode != TT_MODE_OFF && strstr("tsc", x86_cap_flags[i])))
set_cpu_cap(&boot_cpu_data, i);
}
}
--
2.34.1
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 36+ messages in thread* Re: [PATCH RFC 11/11] um: Remove all TSC flags when using Time Travel Mode
2023-11-03 16:41 ` [PATCH RFC 11/11] um: Remove all TSC flags when using Time Travel Mode Benjamin Beichler
@ 2023-11-03 18:45 ` Anton Ivanov
2023-11-06 20:52 ` Johannes Berg
1 sibling, 0 replies; 36+ messages in thread
From: Anton Ivanov @ 2023-11-03 18:45 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Johannes Berg; +Cc: linux-um
On 03/11/2023 16:41, Benjamin Beichler wrote:
> In time travel mode, the internal time is only advanced at discrete
> synchronization points, while the TSC register continuously increases.
> Although the TSC registers remain accessible, we remove the TSC flags to
> indicate to programs that properly check the TSC flag before reading
> that the CPU lacks a proper TSC register.
>
> This change is primarily cosmetic but may encourage some programs to use
> gettimeofday and similar calls for time measurement.
>
> Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
> ---
> arch/um/kernel/um_arch.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
> index b1bfed0c8528..6bfd531b5f87 100644
> --- a/arch/um/kernel/um_arch.c
> +++ b/arch/um/kernel/um_arch.c
> @@ -30,6 +30,7 @@
> #include <kern_util.h>
> #include <mem_user.h>
> #include <os.h>
> +#include <timetravel.h>
>
> #include "um_arch.h"
>
> @@ -284,7 +285,8 @@ static void parse_host_cpu_flags(char *line)
> {
> int i;
> for (i = 0; i < 32*NCAPINTS; i++) {
> - if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]))
> + if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]) &&
> + !(time_travel_mode != TT_MODE_OFF && strstr("tsc", x86_cap_flags[i])))
> set_cpu_cap(&boot_cpu_data, i);
> }
> }
>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [PATCH RFC 11/11] um: Remove all TSC flags when using Time Travel Mode
2023-11-03 16:41 ` [PATCH RFC 11/11] um: Remove all TSC flags when using Time Travel Mode Benjamin Beichler
2023-11-03 18:45 ` Anton Ivanov
@ 2023-11-06 20:52 ` Johannes Berg
1 sibling, 0 replies; 36+ messages in thread
From: Johannes Berg @ 2023-11-06 20:52 UTC (permalink / raw)
To: Benjamin Beichler, Richard Weinberger, Anton Ivanov; +Cc: linux-um
On Fri, 2023-11-03 at 16:41 +0000, Benjamin Beichler wrote:
> In time travel mode, the internal time is only advanced at discrete
> synchronization points, while the TSC register continuously increases.
> Although the TSC registers remain accessible, we remove the TSC flags to
> indicate to programs that properly check the TSC flag before reading
> that the CPU lacks a proper TSC register.
>
> This change is primarily cosmetic but may encourage some programs to use
> gettimeofday and similar calls for time measurement.
>
> Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
> ---
> arch/um/kernel/um_arch.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c
> index b1bfed0c8528..6bfd531b5f87 100644
> --- a/arch/um/kernel/um_arch.c
> +++ b/arch/um/kernel/um_arch.c
> @@ -30,6 +30,7 @@
> #include <kern_util.h>
> #include <mem_user.h>
> #include <os.h>
> +#include <timetravel.h>
>
> #include "um_arch.h"
>
> @@ -284,7 +285,8 @@ static void parse_host_cpu_flags(char *line)
> {
> int i;
> for (i = 0; i < 32*NCAPINTS; i++) {
> - if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]))
> + if ((x86_cap_flags[i] != NULL) && strstr(line, x86_cap_flags[i]) &&
> + !(time_travel_mode != TT_MODE_OFF && strstr("tsc", x86_cap_flags[i])))
> set_cpu_cap(&boot_cpu_data, i);
>
Makes sense, but the code formatting is all wrong ;-)
(indentation should only be to the logical level of the parentheses)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 36+ messages in thread