* [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
@ 2023-10-18 12:36 benjamin
2023-10-18 12:36 ` [PATCH 2/4] um: chan_user: catch EINTR when reading and writing benjamin
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: benjamin @ 2023-10-18 12:36 UTC (permalink / raw)
To: linux-um; +Cc: Benjamin Berg
From: Benjamin Berg <benjamin.berg@intel.com>
When in time-travel mode, the eventfd events are read even when signals
are blocked as SIGIO still needs to be processed. In this case, the
event is cleared on the eventfd but the IRQ still needs to be fired
later.
We did already ensure that the SIGIO handler is run again. However, the
FDs are configured to be level triggered, so that eventfd will not
notify again. As such, add some logic to mark the IRQ as pending and
process it at the next opportunity.
To avoid duplication, reuse the logic used for the suspend/resume case.
This does not really change anything except for delaying running the
IRQs with timetravel_handler at a slightly later point in time (and
possibly running non-timetravel IRQs that shouldn't happen earlier).
While at it, move marking as pending into irq_event_handler as that is
the more logical place for it to happen.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/kernel/irq.c | 78 ++++++++++++++++++++++++++++----------------
1 file changed, 49 insertions(+), 29 deletions(-)
diff --git a/arch/um/kernel/irq.c b/arch/um/kernel/irq.c
index 635d44606bfe..ceda4bd2e5ed 100644
--- a/arch/um/kernel/irq.c
+++ b/arch/um/kernel/irq.c
@@ -37,7 +37,7 @@ struct irq_reg {
bool pending;
bool wakeup;
#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
- bool pending_on_resume;
+ bool pending_event;
void (*timetravel_handler)(int, int, void *,
struct time_travel_event *);
struct time_travel_event event;
@@ -56,6 +56,9 @@ static DEFINE_SPINLOCK(irq_lock);
static LIST_HEAD(active_fds);
static DECLARE_BITMAP(irqs_allocated, UM_LAST_SIGNAL_IRQ);
static bool irqs_suspended;
+#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+static bool irqs_pending;
+#endif
static void irq_io_loop(struct irq_reg *irq, struct uml_pt_regs *regs)
{
@@ -84,9 +87,12 @@ static void irq_event_handler(struct time_travel_event *ev)
{
struct irq_reg *reg = container_of(ev, struct irq_reg, event);
- /* do nothing if suspended - just to cause a wakeup */
- if (irqs_suspended)
+ /* do nothing if suspended; just cause a wakeup and mark as pending */
+ if (irqs_suspended) {
+ irqs_pending = true;
+ reg->pending_event = true;
return;
+ }
generic_handle_irq(reg->irq);
}
@@ -110,16 +116,47 @@ static bool irq_do_timetravel_handler(struct irq_entry *entry,
if (!reg->event.pending)
return false;
- if (irqs_suspended)
- reg->pending_on_resume = true;
return true;
}
+
+static void irq_do_pending_events(bool timetravel_handlers_only)
+{
+ struct irq_entry *entry;
+
+ if (!irqs_pending || timetravel_handlers_only)
+ return;
+
+ irqs_pending = false;
+
+ list_for_each_entry(entry, &active_fds, list) {
+ enum um_irq_type t;
+
+ for (t = 0; t < NUM_IRQ_TYPES; t++) {
+ struct irq_reg *reg = &entry->reg[t];
+
+ /*
+ * Any timetravel_handler was invoked already, just
+ * directly run the IRQ.
+ */
+ if (reg->pending_event) {
+ irq_enter();
+ generic_handle_irq(reg->irq);
+ irq_exit();
+ reg->pending_event = false;
+ }
+ }
+ }
+}
#else
static bool irq_do_timetravel_handler(struct irq_entry *entry,
enum um_irq_type t)
{
return false;
}
+
+static void irq_do_pending_events(bool timetravel_handlers_only)
+{
+}
#endif
static void sigio_reg_handler(int idx, struct irq_entry *entry, enum um_irq_type t,
@@ -145,6 +182,8 @@ static void sigio_reg_handler(int idx, struct irq_entry *entry, enum um_irq_type
*/
if (timetravel_handlers_only) {
#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
+ reg->pending_event = true;
+ irqs_pending = true;
mark_sigio_pending();
#endif
return;
@@ -162,6 +201,10 @@ static void _sigio_handler(struct uml_pt_regs *regs,
if (timetravel_handlers_only && !um_irq_timetravel_handler_used())
return;
+ /* Flush out pending events that were ignored due to time-travel. */
+ if (!irqs_suspended)
+ irq_do_pending_events(timetravel_handlers_only);
+
while (1) {
/* This is now lockless - epoll keeps back-referencesto the irqs
* which have trigger it so there is no need to walk the irq
@@ -543,30 +586,7 @@ void um_irqs_resume(void)
unsigned long flags;
- local_irq_save(flags);
-#ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT
- /*
- * We don't need to lock anything here since we're in resume
- * and nothing else is running, but have disabled IRQs so we
- * don't try anything else with the interrupt list from there.
- */
- list_for_each_entry(entry, &active_fds, list) {
- enum um_irq_type t;
-
- for (t = 0; t < NUM_IRQ_TYPES; t++) {
- struct irq_reg *reg = &entry->reg[t];
-
- if (reg->pending_on_resume) {
- irq_enter();
- generic_handle_irq(reg->irq);
- irq_exit();
- reg->pending_on_resume = false;
- }
- }
- }
-#endif
-
- spin_lock(&irq_lock);
+ spin_lock_irqsave(&irq_lock, flags);
list_for_each_entry(entry, &active_fds, list) {
if (entry->suspended) {
int err = os_set_fd_async(entry->fd);
--
2.41.0
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/4] um: chan_user: catch EINTR when reading and writing
2023-10-18 12:36 [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals benjamin
@ 2023-10-18 12:36 ` benjamin
2023-10-18 12:36 ` [PATCH 3/4] um: chan_user: retry partial writes benjamin
` (2 subsequent siblings)
3 siblings, 0 replies; 19+ messages in thread
From: benjamin @ 2023-10-18 12:36 UTC (permalink / raw)
To: linux-um; +Cc: Benjamin Berg
From: Benjamin Berg <benjamin.berg@intel.com>
If the read/write function returns an error then we expect to see an
event/IRQ later on. However, this will only happen after an EAGAIN as we
are using edge based event triggering.
As such, EINTR needs to be caught should it happen.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/drivers/chan_user.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/um/drivers/chan_user.c b/arch/um/drivers/chan_user.c
index 25727ed648b7..29c68820522a 100644
--- a/arch/um/drivers/chan_user.c
+++ b/arch/um/drivers/chan_user.c
@@ -23,7 +23,7 @@ int generic_read(int fd, char *c_out, void *unused)
{
int n;
- n = read(fd, c_out, sizeof(*c_out));
+ CATCH_EINTR(n = read(fd, c_out, sizeof(*c_out)));
if (n > 0)
return n;
else if (n == 0)
@@ -39,7 +39,7 @@ int generic_write(int fd, const char *buf, int n, void *unused)
{
int err;
- err = write(fd, buf, n);
+ CATCH_EINTR(err = write(fd, buf, n));
if (err > 0)
return err;
else if (errno == EAGAIN)
--
2.41.0
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/4] um: chan_user: retry partial writes
2023-10-18 12:36 [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals benjamin
2023-10-18 12:36 ` [PATCH 2/4] um: chan_user: catch EINTR when reading and writing benjamin
@ 2023-10-18 12:36 ` benjamin
2023-10-18 12:36 ` [PATCH 4/4] um: chan: use blocking IO for console output for time-travel benjamin
2023-10-20 9:15 ` [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals Benjamin Beichler
3 siblings, 0 replies; 19+ messages in thread
From: benjamin @ 2023-10-18 12:36 UTC (permalink / raw)
To: linux-um; +Cc: Benjamin Berg
From: Benjamin Berg <benjamin.berg@intel.com>
In the next commit, we are going to set the output FD to be blocking.
Once that is done, the write() may be short if an interrupt happens
while it is writing out data. As such, to properly catch an EINTR error,
we need to retry the write.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/drivers/chan_user.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/um/drivers/chan_user.c b/arch/um/drivers/chan_user.c
index 29c68820522a..339aae56b6f1 100644
--- a/arch/um/drivers/chan_user.c
+++ b/arch/um/drivers/chan_user.c
@@ -37,11 +37,23 @@ int generic_read(int fd, char *c_out, void *unused)
int generic_write(int fd, const char *buf, int n, void *unused)
{
+ int written = 0;
int err;
- CATCH_EINTR(err = write(fd, buf, n));
- if (err > 0)
- return err;
+ /* The FD may be in blocking mode, as such, need to retry short writes,
+ * they may have been interrupted by a signal.
+ */
+ do {
+ errno = 0;
+ err = write(fd, buf + written, n - written);
+ if (err > 0) {
+ written += err;
+ continue;
+ }
+ } while (err < 0 && errno == EINTR);
+
+ if (written > 0)
+ return written;
else if (errno == EAGAIN)
return 0;
else if (err == 0)
--
2.41.0
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/4] um: chan: use blocking IO for console output for time-travel
2023-10-18 12:36 [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals benjamin
2023-10-18 12:36 ` [PATCH 2/4] um: chan_user: catch EINTR when reading and writing benjamin
2023-10-18 12:36 ` [PATCH 3/4] um: chan_user: retry partial writes benjamin
@ 2023-10-18 12:36 ` benjamin
2023-10-20 9:15 ` [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals Benjamin Beichler
3 siblings, 0 replies; 19+ messages in thread
From: benjamin @ 2023-10-18 12:36 UTC (permalink / raw)
To: linux-um; +Cc: Benjamin Berg
From: Benjamin Berg <benjamin.berg@intel.com>
When in time-travel mode (infinite-cpu or external) time should not pass
for writing to the console. As such, it makes sense to put the FD for
the output side into blocking mode and simply let any write to it hang.
If we did not do this, then time could pass waiting for the console to
become writable again. This is not desirable as it has random effects on
the clock between runs.
Implement this by duplicating the FD if output is active in a relevant
mode and setting the duplicate to be blocking. This avoids changing the
input channel to be blocking should it exists. After this, use the
blocking FD for all write operations and do not allocate an IRQ it is
set.
Without time-travel mode fd_out will always match fd_in and IRQs are
registered.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
---
arch/um/drivers/chan.h | 3 +-
arch/um/drivers/chan_kern.c | 81 ++++++++++++++++++++++++++++---------
arch/um/include/shared/os.h | 1 +
arch/um/os-Linux/file.c | 10 +++++
4 files changed, 74 insertions(+), 21 deletions(-)
diff --git a/arch/um/drivers/chan.h b/arch/um/drivers/chan.h
index 3fec3b8406e9..1bb62f2d7716 100644
--- a/arch/um/drivers/chan.h
+++ b/arch/um/drivers/chan.h
@@ -22,7 +22,8 @@ struct chan {
unsigned int output:1;
unsigned int opened:1;
unsigned int enabled:1;
- int fd;
+ int fd_in;
+ int fd_out; /* only different to fd_in if blocking output is needed */
const struct chan_ops *ops;
void *data;
};
diff --git a/arch/um/drivers/chan_kern.c b/arch/um/drivers/chan_kern.c
index 26a702a06515..1eff939c7ebf 100644
--- a/arch/um/drivers/chan_kern.c
+++ b/arch/um/drivers/chan_kern.c
@@ -81,6 +81,12 @@ static const struct chan_ops not_configged_ops = {
};
#endif /* CONFIG_NOCONFIG_CHAN */
+static inline bool need_output_blocking(void)
+{
+ return time_travel_mode == TT_MODE_INFCPU ||
+ time_travel_mode == TT_MODE_EXTERNAL;
+}
+
static int open_one_chan(struct chan *chan)
{
int fd, err;
@@ -96,15 +102,43 @@ static int open_one_chan(struct chan *chan)
return fd;
err = os_set_fd_block(fd, 0);
- if (err) {
- (*chan->ops->close)(fd, chan->data);
- return err;
- }
+ if (err)
+ goto out_close;
+
+ chan->fd_in = fd;
+ chan->fd_out = fd;
+
+ /*
+ * In time-travel modes infinite-CPU and external we need to guarantee
+ * that any writes to the output succeed immdiately from the point of
+ * the VM. The best way to do this is to put the FD in blocking mode
+ * and simply wait/retry until everything is written.
+ * As every write is guaranteed to complete, we also do not need to
+ * request an IRQ for the output.
+ *
+ * Note that input cannot happen in a time synchronized way. We permit
+ * it, but time passes very quickly if anything waits for a read.
+ */
+ if (chan->output && need_output_blocking()) {
+ err = os_dup_file(chan->fd_out);
+ if (err < 0)
+ goto out_close;
- chan->fd = fd;
+ chan->fd_out = err;
+
+ err = os_set_fd_block(chan->fd_out, 1);
+ if (err) {
+ os_close_file(chan->fd_out);
+ goto out_close;
+ }
+ }
chan->opened = 1;
return 0;
+
+out_close:
+ (*chan->ops->close)(fd, chan->data);
+ return err;
}
static int open_chan(struct list_head *chans)
@@ -125,7 +159,7 @@ static int open_chan(struct list_head *chans)
void chan_enable_winch(struct chan *chan, struct tty_port *port)
{
if (chan && chan->primary && chan->ops->winch)
- register_winch(chan->fd, port);
+ register_winch(chan->fd_in, port);
}
static void line_timer_cb(struct work_struct *work)
@@ -156,8 +190,9 @@ int enable_chan(struct line *line)
if (chan->enabled)
continue;
- err = line_setup_irq(chan->fd, chan->input, chan->output, line,
- chan);
+ err = line_setup_irq(chan->fd_in, chan->input,
+ chan->output && !need_output_blocking(),
+ line, chan);
if (err)
goto out_close;
@@ -196,7 +231,8 @@ void free_irqs(void)
if (chan->input && chan->enabled)
um_free_irq(chan->line->read_irq, chan);
- if (chan->output && chan->enabled)
+ if (chan->output && chan->enabled &&
+ !need_output_blocking())
um_free_irq(chan->line->write_irq, chan);
chan->enabled = 0;
}
@@ -216,15 +252,19 @@ static void close_one_chan(struct chan *chan, int delay_free_irq)
} else {
if (chan->input && chan->enabled)
um_free_irq(chan->line->read_irq, chan);
- if (chan->output && chan->enabled)
+ if (chan->output && chan->enabled &&
+ !need_output_blocking())
um_free_irq(chan->line->write_irq, chan);
chan->enabled = 0;
}
+ if (chan->fd_out != chan->fd_in)
+ os_close_file(chan->fd_out);
if (chan->ops->close != NULL)
- (*chan->ops->close)(chan->fd, chan->data);
+ (*chan->ops->close)(chan->fd_in, chan->data);
chan->opened = 0;
- chan->fd = -1;
+ chan->fd_in = -1;
+ chan->fd_out = -1;
}
void close_chan(struct line *line)
@@ -244,7 +284,7 @@ void close_chan(struct line *line)
void deactivate_chan(struct chan *chan, int irq)
{
if (chan && chan->enabled)
- deactivate_fd(chan->fd, irq);
+ deactivate_fd(chan->fd_in, irq);
}
int write_chan(struct chan *chan, const char *buf, int len,
@@ -255,7 +295,7 @@ int write_chan(struct chan *chan, const char *buf, int len,
if (len == 0 || !chan || !chan->ops->write)
return 0;
- n = chan->ops->write(chan->fd, buf, len, chan->data);
+ n = chan->ops->write(chan->fd_out, buf, len, chan->data);
if (chan->primary) {
ret = n;
}
@@ -269,7 +309,7 @@ int console_write_chan(struct chan *chan, const char *buf, int len)
if (!chan || !chan->ops->console_write)
return 0;
- n = chan->ops->console_write(chan->fd, buf, len);
+ n = chan->ops->console_write(chan->fd_out, buf, len);
if (chan->primary)
ret = n;
return ret;
@@ -297,14 +337,14 @@ int chan_window_size(struct line *line, unsigned short *rows_out,
if (chan && chan->primary) {
if (chan->ops->window_size == NULL)
return 0;
- return chan->ops->window_size(chan->fd, chan->data,
+ return chan->ops->window_size(chan->fd_in, chan->data,
rows_out, cols_out);
}
chan = line->chan_out;
if (chan && chan->primary) {
if (chan->ops->window_size == NULL)
return 0;
- return chan->ops->window_size(chan->fd, chan->data,
+ return chan->ops->window_size(chan->fd_in, chan->data,
rows_out, cols_out);
}
return 0;
@@ -320,7 +360,7 @@ static void free_one_chan(struct chan *chan)
(*chan->ops->free)(chan->data);
if (chan->primary && chan->output)
- ignore_sigio_fd(chan->fd);
+ ignore_sigio_fd(chan->fd_in);
kfree(chan);
}
@@ -479,7 +519,8 @@ static struct chan *parse_chan(struct line *line, char *str, int device,
.output = 0,
.opened = 0,
.enabled = 0,
- .fd = -1,
+ .fd_in = -1,
+ .fd_out = -1,
.ops = ops,
.data = data });
return chan;
@@ -550,7 +591,7 @@ void chan_interrupt(struct line *line, int irq)
schedule_delayed_work(&line->task, 1);
goto out;
}
- err = chan->ops->read(chan->fd, &c, chan->data);
+ err = chan->ops->read(chan->fd_in, &c, chan->data);
if (err > 0)
tty_insert_flip_char(port, c, TTY_NORMAL);
} while (err > 0);
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 0df646c6651e..269373e16397 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -163,6 +163,7 @@ extern int os_set_fd_block(int fd, int blocking);
extern int os_accept_connection(int fd);
extern int os_create_unix_socket(const char *file, int len, int close_on_exec);
extern int os_shutdown_socket(int fd, int r, int w);
+extern int os_dup_file(int fd);
extern void os_close_file(int fd);
extern int os_rcv_fd(int fd, int *helper_pid_out);
extern int os_connect_socket(const char *name);
diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c
index fc4450db59bd..abf8676b834c 100644
--- a/arch/um/os-Linux/file.c
+++ b/arch/um/os-Linux/file.c
@@ -240,6 +240,16 @@ int os_connect_socket(const char *name)
return err;
}
+int os_dup_file(int fd)
+{
+ int new_fd = dup(fd);
+
+ if (new_fd < 0)
+ return -errno;
+
+ return new_fd;
+}
+
void os_close_file(int fd)
{
close(fd);
--
2.41.0
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-18 12:36 [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals benjamin
` (2 preceding siblings ...)
2023-10-18 12:36 ` [PATCH 4/4] um: chan: use blocking IO for console output for time-travel benjamin
@ 2023-10-20 9:15 ` Benjamin Beichler
2023-10-20 9:26 ` Anton Ivanov
2023-10-20 9:59 ` Benjamin Berg
3 siblings, 2 replies; 19+ messages in thread
From: Benjamin Beichler @ 2023-10-20 9:15 UTC (permalink / raw)
To: linux-um
[-- Attachment #1.1.1.1: Type: text/plain, Size: 1798 bytes --]
Am 18.10.2023 um 14:36 schrieb benjamin@sipsolutions.net:
> From: Benjamin Berg <benjamin.berg@intel.com>
>
> When in time-travel mode, the eventfd events are read even when signals
> are blocked as SIGIO still needs to be processed. In this case, the
> event is cleared on the eventfd but the IRQ still needs to be fired
> later.
>
> We did already ensure that the SIGIO handler is run again. However, the
> FDs are configured to be level triggered, so that eventfd will not
> notify again. As such, add some logic to mark the IRQ as pending and
> process it at the next opportunity.
>
> To avoid duplication, reuse the logic used for the suspend/resume case.
> This does not really change anything except for delaying running the
> IRQs with timetravel_handler at a slightly later point in time (and
> possibly running non-timetravel IRQs that shouldn't happen earlier).
> While at it, move marking as pending into irq_event_handler as that is
> the more logical place for it to happen.
>
> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
I also noticed this problem, but after a discussion with Johannes Berg
about this, we come to the conclusion, that all drivers with interrupts
need to deal with time travel mode via their own time travel handler.
What I actually did, was to write a trivial handler for e.g., the serial
line, which simply only call time_travel_add_irq_event(ev).
I'm not entirely sure, what the right way to do it, although the outcome
seems to be the same, as with no time advance the actual simulation
time, when the interrupt is handled is the same.
I actually also have some other significant bug fixes on time travel in
my tree, but I was too lazy to send them here, are you interested in
taking a look at them?
[-- Attachment #1.1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3025 bytes --]
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
[-- Attachment #2: Type: text/plain, Size: 152 bytes --]
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 9:15 ` [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals Benjamin Beichler
@ 2023-10-20 9:26 ` Anton Ivanov
2023-10-20 10:33 ` Benjamin Beichler
2023-10-20 9:59 ` Benjamin Berg
1 sibling, 1 reply; 19+ messages in thread
From: Anton Ivanov @ 2023-10-20 9:26 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On 20/10/2023 10:15, Benjamin Beichler wrote:
> Am 18.10.2023 um 14:36 schrieb benjamin@sipsolutions.net:
>> From: Benjamin Berg <benjamin.berg@intel.com>
>>
>> When in time-travel mode, the eventfd events are read even when signals
>> are blocked as SIGIO still needs to be processed. In this case, the
>> event is cleared on the eventfd but the IRQ still needs to be fired
>> later.
>>
>> We did already ensure that the SIGIO handler is run again. However, the
>> FDs are configured to be level triggered, so that eventfd will not
>> notify again. As such, add some logic to mark the IRQ as pending and
>> process it at the next opportunity.
>>
>> To avoid duplication, reuse the logic used for the suspend/resume case.
>> This does not really change anything except for delaying running the
>> IRQs with timetravel_handler at a slightly later point in time (and
>> possibly running non-timetravel IRQs that shouldn't happen earlier).
>> While at it, move marking as pending into irq_event_handler as that is
>> the more logical place for it to happen.
>>
>> Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
>
> I also noticed this problem, but after a discussion with Johannes Berg about this, we come to the conclusion, that all drivers with interrupts need to deal with time travel mode via their own time travel handler. What I actually did, was to write a trivial handler for e.g., the serial line, which simply only call time_travel_add_irq_event(ev).
>
> I'm not entirely sure, what the right way to do it, although the outcome seems to be the same, as with no time advance the actual simulation time, when the interrupt is handled is the same.
Add an extra registration flag for um_register_irq?
enum um_irq_type {
IRQ_READ,
IRQ_WRITE,
IRQ_HAVE_TT_HANDLER,
NUM_IRQ_TYPES,
};
Ones that do not register get the default one which advances the time. Ones that do - have it left to them and do with time whatever they need.
>
> I actually also have some other significant bug fixes on time travel in my tree, but I was too lazy to send them here, are you interested in taking a look at them?
>
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
--
Anton R. Ivanov
https://www.kot-begemot.co.uk/
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 9:15 ` [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals Benjamin Beichler
2023-10-20 9:26 ` Anton Ivanov
@ 2023-10-20 9:59 ` Benjamin Berg
2023-10-20 10:38 ` Benjamin Beichler
1 sibling, 1 reply; 19+ messages in thread
From: Benjamin Berg @ 2023-10-20 9:59 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On Fri, 2023-10-20 at 11:15 +0200, Benjamin Beichler wrote:
> Am 18.10.2023 um 14:36 schrieb benjamin@sipsolutions.net:
> > From: Benjamin Berg <benjamin.berg@intel.com>
> >
> > When in time-travel mode, the eventfd events are read even when signals
> > are blocked as SIGIO still needs to be processed. In this case, the
> > event is cleared on the eventfd but the IRQ still needs to be fired
> > later.
> >
> > We did already ensure that the SIGIO handler is run again. However, the
> > FDs are configured to be level triggered, so that eventfd will not
> > notify again. As such, add some logic to mark the IRQ as pending and
> > process it at the next opportunity.
> >
> > To avoid duplication, reuse the logic used for the suspend/resume case.
> > This does not really change anything except for delaying running the
> > IRQs with timetravel_handler at a slightly later point in time (and
> > possibly running non-timetravel IRQs that shouldn't happen earlier).
> > While at it, move marking as pending into irq_event_handler as that is
> > the more logical place for it to happen.
> >
> > Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
>
> I also noticed this problem, but after a discussion with Johannes Berg
> about this, we come to the conclusion, that all drivers with interrupts
> need to deal with time travel mode via their own time travel handler.
> What I actually did, was to write a trivial handler for e.g., the serial
> line, which simply only call time_travel_add_irq_event(ev).
Yes, we could instead just assert that we do not have any IRQ without a
timetravel handler. We really shouldn't have, but we do have one for
stdin unfortunately (which is at least used by the hostap hwsim tests,
so we might want to not remove it until we have a different solution).
What we could do is just add an stdin timetravel handler as you have
done, but print a warning when the stdin IRQ is registered[1]. And
then, if someone tries to register an IRQ without a time-travel handler
we just panic(), it should never happen.
Benjamin
[1] We'll also need to set stdin to be a "null" channel and then
somehow avoid registering the IRQ in that case.
> I'm not entirely sure, what the right way to do it, although the outcome
> seems to be the same, as with no time advance the actual simulation
> time, when the interrupt is handled is the same.
>
> I actually also have some other significant bug fixes on time travel in
> my tree, but I was too lazy to send them here, are you interested in
> taking a look at them?
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 9:26 ` Anton Ivanov
@ 2023-10-20 10:33 ` Benjamin Beichler
0 siblings, 0 replies; 19+ messages in thread
From: Benjamin Beichler @ 2023-10-20 10:33 UTC (permalink / raw)
To: linux-um
Am 20.10.2023 um 11:26 schrieb Anton Ivanov:
>> I also noticed this problem, but after a discussion with Johannes Berg
>> about this, we come to the conclusion, that all drivers with
>> interrupts need to deal with time travel mode via their own time
>> travel handler. What I actually did, was to write a trivial handler
>> for e.g., the serial line, which simply only call
>> time_travel_add_irq_event(ev).
>>
>> I'm not entirely sure, what the right way to do it, although the
>> outcome seems to be the same, as with no time advance the actual
>> simulation time, when the interrupt is handled is the same.
>
> Add an extra registration flag for um_register_irq?
>
>
> enum um_irq_type {
> IRQ_READ,
> IRQ_WRITE,
> IRQ_HAVE_TT_HANDLER,
> NUM_IRQ_TYPES,
> };
>
> Ones that do not register get the default one which advances the time.
> Ones that do - have it left to them and do with time whatever they need.
>
There is already mechanics for it, as we have um_request_irq_tt and
um_request_irq. I don't think an additional type enhance the semantics.
>>
>> I actually also have some other significant bug fixes on time travel
>> in my tree, but I was too lazy to send them here, are you interested
>> in taking a look at them?
>>
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 9:59 ` Benjamin Berg
@ 2023-10-20 10:38 ` Benjamin Beichler
2023-10-20 11:39 ` Johannes Berg
0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Beichler @ 2023-10-20 10:38 UTC (permalink / raw)
To: linux-um
[-- Attachment #1.1.1.1: Type: text/plain, Size: 1696 bytes --]
Am 20.10.2023 um 11:59 schrieb Benjamin Berg:
>>
>> I also noticed this problem, but after a discussion with Johannes Berg
>> about this, we come to the conclusion, that all drivers with interrupts
>> need to deal with time travel mode via their own time travel handler.
>> What I actually did, was to write a trivial handler for e.g., the serial
>> line, which simply only call time_travel_add_irq_event(ev).
>
> Yes, we could instead just assert that we do not have any IRQ without a
> timetravel handler. We really shouldn't have, but we do have one for
> stdin unfortunately (which is at least used by the hostap hwsim tests,
> so we might want to not remove it until we have a different solution).
>
> What we could do is just add an stdin timetravel handler as you have
> done, but print a warning when the stdin IRQ is registered[1]. And
> then, if someone tries to register an IRQ without a time-travel handler
> we just panic(), it should never happen.
Can you explain, why a time travel handler for stdin may be bad? It
sounds like you want to avoid it, but I see no immediate problem.
>
> Benjamin
>
> [1] We'll also need to set stdin to be a "null" channel and then
> somehow avoid registering the IRQ in that case.
>
--
M.Sc. Benjamin Beichler
Universität Rostock, Fakultät für Informatik und Elektrotechnik
Institut für Angewandte Mikroelektronik und Datentechnik
University of Rostock, Department of CS and EE
Institute of Applied Microelectronics and CE
Albert-Einstein-Str. 26
18059 Rostock
Deutschland/Germany
phone: +49 (0) 381 498 - 7278
email: Benjamin.Beichler@uni-rostock.de
www: http://www.imd.uni-rostock.de/
[-- Attachment #1.1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3025 bytes --]
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
[-- Attachment #2: Type: text/plain, Size: 152 bytes --]
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 10:38 ` Benjamin Beichler
@ 2023-10-20 11:39 ` Johannes Berg
2023-10-20 12:06 ` Benjamin Beichler
0 siblings, 1 reply; 19+ messages in thread
From: Johannes Berg @ 2023-10-20 11:39 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
>
> Can you explain, why a time travel handler for stdin may be bad? It
> sounds like you want to avoid it, but I see no immediate problem.
>
I need to read the thread, but this one's easy ;-)
The thing is that on such a channel you don't send an ACK when you've
seen that there's something to handle. As a result, the sender will
continue running while you're trying to request a new schedule entry
from the controller. As a result, it may run past your new schedule
entry because it didn't know about it yet (this would likely bring down
the controller and crash the simulation), or the relative order of the
two entries is undefined, in the sense that it depends on the process
scheduling of the host.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 11:39 ` Johannes Berg
@ 2023-10-20 12:06 ` Benjamin Beichler
2023-10-20 12:20 ` Johannes Berg
0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Beichler @ 2023-10-20 12:06 UTC (permalink / raw)
To: Johannes Berg, linux-um
[-- Attachment #1.1.1.1: Type: text/plain, Size: 1964 bytes --]
Am 20.10.2023 um 13:39 schrieb Johannes Berg:
> On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
>>
>> Can you explain, why a time travel handler for stdin may be bad? It
>> sounds like you want to avoid it, but I see no immediate problem.
>>
>
> I need to read the thread, but this one's easy ;-)
>
> The thing is that on such a channel you don't send an ACK when you've
> seen that there's something to handle. As a result, the sender will
> continue running while you're trying to request a new schedule entry
> from the controller. As a result, it may run past your new schedule
> entry because it didn't know about it yet (this would likely bring down
> the controller and crash the simulation), or the relative order of the
> two entries is undefined, in the sense that it depends on the process
> scheduling of the host.
Sorry, but I did not get this. What may run past the schedule entry? Is
your assumption, that the "thing" connected to stdin is always totally
unaware of the time travel mode?
When our (to be published) simulation send something on serial lines (I
think it does not matter whether it is a socket or a pipe), we expect
that the uml instance needs to be run as long as it changes back to
idle/wait state before the simulation time is advanced. Since the basis
model of the time travel mode is, that you have infinite amount of
processing power, the interrupt needs to be always handled at the
current time.
Maybe my think-model only holds for "smaller" amounts of data (maybe one
page or something?) or not the free-running mode, but I'm not completely
convinced. :-D
Maybe we need to define, a bit more formally, how the (designed)
processing model of interrupts in time travel mode is.
>
> johannes
>
> _______________________________________________
> linux-um mailing list
> linux-um@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-um
[-- Attachment #1.1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3025 bytes --]
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
[-- Attachment #2: Type: text/plain, Size: 152 bytes --]
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 12:06 ` Benjamin Beichler
@ 2023-10-20 12:20 ` Johannes Berg
2023-10-20 12:23 ` Johannes Berg
2023-10-20 12:58 ` Benjamin Beichler
0 siblings, 2 replies; 19+ messages in thread
From: Johannes Berg @ 2023-10-20 12:20 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On Fri, 2023-10-20 at 14:06 +0200, Benjamin Beichler wrote:
> Am 20.10.2023 um 13:39 schrieb Johannes Berg:
> > On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
> > >
> > > Can you explain, why a time travel handler for stdin may be bad? It
> > > sounds like you want to avoid it, but I see no immediate problem.
> > >
> >
> > I need to read the thread, but this one's easy ;-)
> >
> > The thing is that on such a channel you don't send an ACK when you've
> > seen that there's something to handle. As a result, the sender will
> > continue running while you're trying to request a new schedule entry
> > from the controller. As a result, it may run past your new schedule
> > entry because it didn't know about it yet (this would likely bring down
> > the controller and crash the simulation), or the relative order of the
> > two entries is undefined, in the sense that it depends on the process
> > scheduling of the host.
> Sorry, but I did not get this. What may run past the schedule entry? Is
> your assumption, that the "thing" connected to stdin is always totally
> unaware of the time travel mode?
No, I'm not assuming that, if that's done all bets are off anyway. I'm
assuming that both are connected to the controller.
> When our (to be published) simulation send something on serial lines (I
> think it does not matter whether it is a socket or a pipe), we expect
> that the uml instance needs to be run as long as it changes back to
> idle/wait state before the simulation time is advanced. Since the basis
> model of the time travel mode is, that you have infinite amount of
> processing power, the interrupt needs to be always handled at the
> current time.
Yes but you need to schedule for the interrupt, and you don't
necessarily know what 'current time' is at interrupt time.
So let's say you have "something" that's scheduled to run at times
- 1000
- 2000
- 3000
and free-until is 10000 or something high.
Now it sends a message to Linux stdio at 1000.
But it doesn't have a way to wait for ack. So it continues, checks the
schedule and free-until, and can advance time to 2000 since it doesn't
yet know Linux requested time to run at 1000.
This gives you something not predictable - it depends on the host
scheduling, whichever ran first (Linux started next or the other
continued, or both and it gets messed up).
> Maybe my think-model only holds for "smaller" amounts of data (maybe one
> page or something?) or not the free-running mode, but I'm not completely
> convinced. :-D
Nah it doesn't really matter how much data there is.
> Maybe we need to define, a bit more formally, how the (designed)
> processing model of interrupts in time travel mode is.
That's probably a separate Master's thesis ;-)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 12:20 ` Johannes Berg
@ 2023-10-20 12:23 ` Johannes Berg
2023-10-20 12:43 ` Benjamin Beichler
2023-10-20 12:58 ` Benjamin Beichler
1 sibling, 1 reply; 19+ messages in thread
From: Johannes Berg @ 2023-10-20 12:23 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On Fri, 2023-10-20 at 14:20 +0200, Johannes Berg wrote:
>
> Yes but you need to schedule for the interrupt, and you don't
> necessarily know what 'current time' is at interrupt time.
>
> So let's say you have "something" that's scheduled to run at times
> - 1000
> - 2000
> - 3000
>
> and free-until is 10000 or something high.
>
It can also happen without free-until, then it just depends which one of
the two - they're running in parallel now (linux doing time-travel
interrupt handling and adding the event, the other thing continuing to
schedule and doing the next entry) - asks the controller first.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 12:23 ` Johannes Berg
@ 2023-10-20 12:43 ` Benjamin Beichler
2023-10-20 12:58 ` Johannes Berg
0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Beichler @ 2023-10-20 12:43 UTC (permalink / raw)
To: Johannes Berg, linux-um
>> Yes but you need to schedule for the interrupt, and you don't
>> necessarily know what 'current time' is at interrupt time.
>>
>> So let's say you have "something" that's scheduled to run at times
>> - 1000
>> - 2000
>> - 3000
>>
>> and free-until is 10000 or something high.
>>
> It can also happen without free-until, then it just depends which one of
> the two - they're running in parallel now (linux doing time-travel
> interrupt handling and adding the event, the other thing continuing to
> schedule and doing the next entry) - asks the controller first.
Since they happen at the same time, most discrete event simulations make
the assumption, that the actual order of execution should not be part of
the semantics.
Although, most of the implementations demand a deterministic order (for
repetition) but even that requirement is sometimes lifted. So this is no
contradiction to common discrete event simulations.
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 12:43 ` Benjamin Beichler
@ 2023-10-20 12:58 ` Johannes Berg
0 siblings, 0 replies; 19+ messages in thread
From: Johannes Berg @ 2023-10-20 12:58 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On Fri, 2023-10-20 at 14:43 +0200, Benjamin Beichler wrote:
> > > Yes but you need to schedule for the interrupt, and you don't
> > > necessarily know what 'current time' is at interrupt time.
> > >
> > > So let's say you have "something" that's scheduled to run at times
> > > - 1000
> > > - 2000
> > > - 3000
> > >
> > > and free-until is 10000 or something high.
> > >
> > It can also happen without free-until, then it just depends which one of
> > the two - they're running in parallel now (linux doing time-travel
> > interrupt handling and adding the event, the other thing continuing to
> > schedule and doing the next entry) - asks the controller first.
>
> Since they happen at the same time, most discrete event simulations make
> the assumption, that the actual order of execution should not be part of
> the semantics.
>
Well, fair, but they can bounce more messages around them, and that
should change things. I don't think we've built a simulation system here
that can strictly assume this is true.
In any case, it can get worse - if he sender process actually updated
the controller to 2000 by the time the linux actually calls
time_travel_add_irq_event(), the linux event will run at 2000 instead of
1000.
All this is why we have the vhost-user extensions for ACK messages etc.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 12:20 ` Johannes Berg
2023-10-20 12:23 ` Johannes Berg
@ 2023-10-20 12:58 ` Benjamin Beichler
2023-10-20 13:39 ` Johannes Berg
1 sibling, 1 reply; 19+ messages in thread
From: Benjamin Beichler @ 2023-10-20 12:58 UTC (permalink / raw)
To: Johannes Berg, linux-um
> On Fri, 2023-10-20 at 14:06 +0200, Benjamin Beichler wrote:
>> Am 20.10.2023 um 13:39 schrieb Johannes Berg:
>>> On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
>>>> Can you explain, why a time travel handler for stdin may be bad? It
>>>> sounds like you want to avoid it, but I see no immediate problem.
>>>>
>>> I need to read the thread, but this one's easy ;-)
>>>
>>> The thing is that on such a channel you don't send an ACK when you've
>>> seen that there's something to handle. As a result, the sender will
>>> continue running while you're trying to request a new schedule entry
>>> from the controller. As a result, it may run past your new schedule
>>> entry because it didn't know about it yet (this would likely bring down
>>> the controller and crash the simulation), or the relative order of the
>>> two entries is undefined, in the sense that it depends on the process
>>> scheduling of the host.
>> Sorry, but I did not get this. What may run past the schedule entry? Is
>> your assumption, that the "thing" connected to stdin is always totally
>> unaware of the time travel mode?
> No, I'm not assuming that, if that's done all bets are off anyway. I'm
> assuming that both are connected to the controller.
>
>> When our (to be published) simulation send something on serial lines (I
>> think it does not matter whether it is a socket or a pipe), we expect
>> that the uml instance needs to be run as long as it changes back to
>> idle/wait state before the simulation time is advanced. Since the basis
>> model of the time travel mode is, that you have infinite amount of
>> processing power, the interrupt needs to be always handled at the
>> current time.
> Yes but you need to schedule for the interrupt, and you don't
> necessarily know what 'current time' is at interrupt time.
>
> So let's say you have "something" that's scheduled to run at times
> - 1000
> - 2000
> - 3000
>
> and free-until is 10000 or something high.
>
> Now it sends a message to Linux stdio at 1000.
>
> But it doesn't have a way to wait for ack. So it continues, checks the
> schedule and free-until, and can advance time to 2000 since it doesn't
> yet know Linux requested time to run at 1000.
Do I get it right, that you anticipate that all parts of the simulation
may run at different paces and the controller is only needed for
synchronization?
My mind model is, that simulation time advances in the controller are
only done, If all parts of the simulation reached the current simulation
time and are at the event processed/idle/wait state. Therefore, the
thing connected to stdio will not advance to some other time, if some
other component has outstanding events (i.e., an interrupt at this
case). Of course free-running is a problematic in this mind model, but
in this mode you willingly sacrifice the precision of events for
performance.
>
> This gives you something not predictable - it depends on the host
> scheduling, whichever ran first (Linux started next or the other
> continued, or both and it gets messed up).
>
>> Maybe my think-model only holds for "smaller" amounts of data (maybe one
>> page or something?) or not the free-running mode, but I'm not completely
>> convinced. :-D
> Nah it doesn't really matter how much data there is.
>
>> Maybe we need to define, a bit more formally, how the (designed)
>> processing model of interrupts in time travel mode is.
> That's probably a separate Master's thesis ;-)
It could, but some basic definition for a common understanding might help.
However, do you like to have a look at my current state of time travel
changes in github or do you think an RFC to the mailing list is better?
I like github for a quick dive into patches, but your whole workflow is
maybe different ;-)
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 12:58 ` Benjamin Beichler
@ 2023-10-20 13:39 ` Johannes Berg
2023-10-20 15:47 ` Benjamin Beichler
0 siblings, 1 reply; 19+ messages in thread
From: Johannes Berg @ 2023-10-20 13:39 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On Fri, 2023-10-20 at 14:58 +0200, Benjamin Beichler wrote:
> > On Fri, 2023-10-20 at 14:06 +0200, Benjamin Beichler wrote:
> > > Am 20.10.2023 um 13:39 schrieb Johannes Berg:
> > > > On Fri, 2023-10-20 at 12:38 +0200, Benjamin Beichler wrote:
> > > > > Can you explain, why a time travel handler for stdin may be bad? It
> > > > > sounds like you want to avoid it, but I see no immediate problem.
> > > > >
> > > > I need to read the thread, but this one's easy ;-)
> > > >
> > > > The thing is that on such a channel you don't send an ACK when you've
> > > > seen that there's something to handle. As a result, the sender will
> > > > continue running while you're trying to request a new schedule entry
> > > > from the controller. As a result, it may run past your new schedule
> > > > entry because it didn't know about it yet (this would likely bring down
> > > > the controller and crash the simulation), or the relative order of the
> > > > two entries is undefined, in the sense that it depends on the process
> > > > scheduling of the host.
> > > Sorry, but I did not get this. What may run past the schedule entry? Is
> > > your assumption, that the "thing" connected to stdin is always totally
> > > unaware of the time travel mode?
> > No, I'm not assuming that, if that's done all bets are off anyway. I'm
> > assuming that both are connected to the controller.
> >
> > > When our (to be published) simulation send something on serial lines (I
> > > think it does not matter whether it is a socket or a pipe), we expect
> > > that the uml instance needs to be run as long as it changes back to
> > > idle/wait state before the simulation time is advanced. Since the basis
> > > model of the time travel mode is, that you have infinite amount of
> > > processing power, the interrupt needs to be always handled at the
> > > current time.
> > Yes but you need to schedule for the interrupt, and you don't
> > necessarily know what 'current time' is at interrupt time.
> >
> > So let's say you have "something" that's scheduled to run at times
> > - 1000
> > - 2000
> > - 3000
> >
> > and free-until is 10000 or something high.
> >
> > Now it sends a message to Linux stdio at 1000.
> >
> > But it doesn't have a way to wait for ack. So it continues, checks the
> > schedule and free-until, and can advance time to 2000 since it doesn't
> > yet know Linux requested time to run at 1000.
>
> Do I get it right, that you anticipate that all parts of the simulation
> may run at different paces and the controller is only needed for
> synchronization?
Not sure what you mean by that.
What I _wanted_ is that only one can run at a time (from a host
perspective), so e.g. virtio stuff works like
A is running
A -> B virtio:
A puts data on ring virtio ring
A sends message on vhost-user socket to indicate ring update to B
A waits for ACK for that message
B sees the ring update message
B requests time from controller
B sends ACK to A
A continues running
A gets to the next scheduling point but now sees free-until==now
A releases time slice to controller
controller assigns time slice to B
B runs to process the virtio ring data
So you see that only one is running at a time.
Without the ACK, you'd have
B sees the ring update message
B requests time from the controller
in parallel with
A continues running
A gets to the next scheduling point
but it might either not see free-until==now, or other things.
> My mind model is, that simulation time advances in the controller are
> only done, If all parts of the simulation reached the current simulation
> time and are at the event processed/idle/wait state.
Yeah that's a simpler model, and in what we implemented it's true if you
do have the ACK messages. If don't have free-until, that's also possibly
true, although we might have implementation bugs in a sense with
time_travel_add_irq_event() since that looks at the current time. And
also, if you request something the controller thinks is already in the
past because the other side updated the controller due to free-until,
the simulation just crashes.
It's also possible that B (in the situation above) actually *does* ask
the controller who should run next, and the controller simply doesn't
yet know:
B sends message to A's stdin (say at 1000)
B requests to run at 2000 from controller
B releases time to controller
controller sets time to 2000 and tells B to run
A requests scheduling at 1000
controller *crashes*
This is actually quite likely if the host is running 6.6 with the new
EEVDF scheduler, because it will prefer to let B continue running, and
then it just needs to pick the controller over A next.
This again is because of the lack of synchronization in editing the
schedule (sending messages to the controller) which the ACK message
solves.
Hence our internal use cases for "text protocol" actually use virtio-
console rather than stdio, so you do have ACK on the vhost-user protocol
again.
> Therefore, the
> thing connected to stdio will not advance to some other time, if some
> other component has outstanding events (i.e., an interrupt at this
> case).
Yes, but by the time it actually asks "are there outstanding events",
they might no be added yet, due to hos scheduling order!
> Of course free-running is a problematic in this mind model, but
> in this mode you willingly sacrifice the precision of events for
> performance.
If you're referring to free-until: that isn't free-running, it's just
meant to reduce the communication overhead if you know nobody else is
running. The controller maintains this.
Btw we have further overhead reductions with shared memory, need to push
all that out.
> > > Maybe we need to define, a bit more formally, how the (designed)
> > > processing model of interrupts in time travel mode is.
> > That's probably a separate Master's thesis ;-)
>
> It could, but some basic definition for a common understanding might help.
:)
Does the above help a bit? But yeah maybe I should try to write it down
more formally.
> However, do you like to have a look at my current state of time travel
> changes in github or do you think an RFC to the mailing list is better?
> I like github for a quick dive into patches, but your whole workflow is
> maybe different ;-)
>
RFC to the list is probably easier to comment on etc.
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 13:39 ` Johannes Berg
@ 2023-10-20 15:47 ` Benjamin Beichler
2023-10-20 15:51 ` Johannes Berg
0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Beichler @ 2023-10-20 15:47 UTC (permalink / raw)
To: Johannes Berg, linux-um
>> My mind model is, that simulation time advances in the controller are
>> only done, If all parts of the simulation reached the current
>> simulation time and are at the event processed/idle/wait state.
> Yeah that's a simpler model, and in what we implemented it's true if
> you do have the ACK messages. If don't have free-until, that's also
> possibly true, although we might have implementation bugs in a sense
> with time_travel_add_irq_event() since that looks at the current time.
> And also, if you request something the controller thinks is already in
> the past because the other side updated the controller due to
> free-until, the simulation just crashes. It's also possible that B (in
> the situation above) actually *does* ask the controller who should run
> next, and the controller simply doesn't yet know: B sends message to
> A's stdin (say at 1000) B requests to run at 2000 from controller B
> releases time to controller controller sets time to 2000 and tells B
> to run A requests scheduling at 1000 controller *crashes*
I think, I get now, why I don't have this Problem: My virtio simulation
and my controller/simulation-master are tightly coupled and indeed the
same process. When I create a new event for the UML-instance, it is
anticipated in the simulation master.
So my sequence is more like:
B sends message to A's stdin (say at 1000)
B tells the controller, that it activated A, and time should not advance
until A has sent a request for processing an interrupt and has gone back
to waiting state
B requests to run at 2000 from controller
B releases time to controller
...
I see that without further information from the device simulation to the
controller, it is quite harder.
Nonetheless, I'm not totally sure, how this interacts with the timing
semantics of the interrupts here. Either the solution of Benjamin Berg
or mine (with the trivial tt-handler have the same problem).
I will post my patches the next days, you may have a look on them, maybe
you also experienced some of my problems :-)
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals
2023-10-20 15:47 ` Benjamin Beichler
@ 2023-10-20 15:51 ` Johannes Berg
0 siblings, 0 replies; 19+ messages in thread
From: Johannes Berg @ 2023-10-20 15:51 UTC (permalink / raw)
To: Benjamin Beichler, linux-um
On Fri, 2023-10-20 at 17:47 +0200, Benjamin Beichler wrote:
>
> I think, I get now, why I don't have this Problem: My virtio simulation
> and my controller/simulation-master are tightly coupled and indeed the
> same process. When I create a new event for the UML-instance, it is
> anticipated in the simulation master.
>
> So my sequence is more like:
> B sends message to A's stdin (say at 1000)
> B tells the controller, that it activated A, and time should not advance
> until A has sent a request for processing an interrupt and has gone back
> to waiting state
> B requests to run at 2000 from controller
> B releases time to controller
> ...
Ahh, OK. I thought you were probably using our controller from the
usfstl, but of course you don't have to. I think I did push the shared
memory optimisation there though, but I suspect I haven't posted the
Linux client version.
> I see that without further information from the device simulation to the
> controller, it is quite harder.
Right, I wanted to handle that in the other side (hence the ACK) since
you might not always know _when_ the interrupt is scheduled there, even
if it's currently always "immediately".
> Nonetheless, I'm not totally sure, how this interacts with the timing
> semantics of the interrupts here. Either the solution of Benjamin Berg
> or mine (with the trivial tt-handler have the same problem).
Oh, yeah, his changes do have the same problem. He's just fixing that
interrupts got lost completely in some already _otherwise_ broken cases,
to make it work better without hanging, not to make it work correctly.
To make it work correctly you shouldn't use stdio at all, or I guess
have some external helper thing like you do :)
johannes
_______________________________________________
linux-um mailing list
linux-um@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-um
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2023-10-20 15:51 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-18 12:36 [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals benjamin
2023-10-18 12:36 ` [PATCH 2/4] um: chan_user: catch EINTR when reading and writing benjamin
2023-10-18 12:36 ` [PATCH 3/4] um: chan_user: retry partial writes benjamin
2023-10-18 12:36 ` [PATCH 4/4] um: chan: use blocking IO for console output for time-travel benjamin
2023-10-20 9:15 ` [PATCH 1/4] um: irqs: process outstanding IRQs when unblocking signals Benjamin Beichler
2023-10-20 9:26 ` Anton Ivanov
2023-10-20 10:33 ` Benjamin Beichler
2023-10-20 9:59 ` Benjamin Berg
2023-10-20 10:38 ` Benjamin Beichler
2023-10-20 11:39 ` Johannes Berg
2023-10-20 12:06 ` Benjamin Beichler
2023-10-20 12:20 ` Johannes Berg
2023-10-20 12:23 ` Johannes Berg
2023-10-20 12:43 ` Benjamin Beichler
2023-10-20 12:58 ` Johannes Berg
2023-10-20 12:58 ` Benjamin Beichler
2023-10-20 13:39 ` Johannes Berg
2023-10-20 15:47 ` Benjamin Beichler
2023-10-20 15:51 ` Johannes Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox