vector-im-hydrogen-web/doc/implementation planning/RECONNECTING.md

84 lines
5.5 KiB
Markdown
Raw Permalink Normal View History

# Reconnecting
2020-04-05 15:11:15 +02:00
`HomeServerApi` notifies `Reconnector` of network call failure
2020-04-05 15:11:15 +02:00
`Reconnector` listens for online/offline event
2020-04-05 15:11:15 +02:00
`Reconnector` polls `/versions` with a `RetryDelay` (implemented as ExponentialRetryDelay, also used by SendScheduler if no retry_after_ms is given)
2020-04-05 15:11:15 +02:00
`Reconnector` emits an event when sync and message sending should retry
2020-04-05 15:11:15 +02:00
`Sync` listen to `Reconnector`
`Sync` notifies when the catchup sync has happened
2020-04-05 15:11:15 +02:00
`Reconnector` has state:
- disconnected (and retrying at x seconds from timestamp)
- reconnecting (call /versions, and if successful /sync)
- connected
2020-04-05 15:11:15 +02:00
`Reconnector` has a method to try to connect now
`SessionStatus` can be:
- disconnected (and retrying at x seconds from timestamp)
- reconnecting
- connected (and syncing)
- doing catchup sync
- sending x / y messages
rooms should report how many messages they have queued up, and each time they sent one?
`SendReporter` (passed from `Session` to `Room`, passed down to `SendQueue`), with:
2020-04-18 19:16:16 +02:00
- setPendingEventCount(roomId, count). This should probably use the generic Room updating mechanism, e.g. a pendingMessageCount on Room that is updated. Then session listens for this in `_roomUpdateCallback`.
2020-04-05 15:11:15 +02:00
`Session` listens to `Reconnector` to update it's status, but perhaps we wait to send messages until catchup sync is done
2020-04-18 19:16:16 +02:00
# TODO
2020-04-19 19:02:10 +02:00
- DONE: finish (Base)ObservableValue
2020-04-18 19:16:16 +02:00
- put in own file
2020-04-19 19:02:10 +02:00
- add waitFor (won't this leak if the promise never resolves?)
2020-04-18 19:16:16 +02:00
- decide whether we want to inherit (no?)
- DONE: cleanup Reconnector with recent changes, move generic code, make imports work
- DONE: add SyncStatus as ObservableValue of enum in Sync
- DONE: cleanup SessionContainer
2020-04-20 21:43:02 +02:00
- DONE: move all imports to non-default
- DONE: remove #ifdef
- DONE: move EventEmitter to utils
- DONE: move all lower-cased files
2020-04-20 23:10:54 +02:00
- DONE: change main.js to pass in a creation function of a SessionContainer instead of everything it is replacing
- DONE: adjust BrawlViewModel, SessionPickViewModel and LoginViewModel to use a SessionContainer
2020-05-05 23:20:03 +02:00
- DONE: show load progress in LoginView/SessionPickView and do away with loading screen
- DONE: rename SessionsStore to SessionInfoStorage
2020-04-18 19:16:16 +02:00
- make sure we've renamed all \*State enums and fields to \*Status
- add pendingMessageCount prop to SendQueue and Room, aggregate this in Session
2020-05-05 23:20:03 +02:00
- DONE: add completedFirstSync to Sync, so we can check if the catchup or initial sync is still in progress
- DONE: update SyncStatusViewModel to use reconnector.connectionStatus, sync.completedFirstSync, session.syncToken (is initial sync?) and session.pendingMessageCount to show these messages:
- DONE: disconnected, retrying in x seconds. [try now].
- DONE: reconnecting...
- DONE: doing catchup sync
2020-04-18 19:16:16 +02:00
- syncing, sending x messages
2020-05-05 23:20:03 +02:00
- DONE: syncing
2020-04-18 19:16:16 +02:00
perhaps we will want to put this as an ObservableValue on the SessionContainer ?
NO: When connected, syncing and not sending anything, just hide the thing for now? although when you send messages it will just pop in and out all the time.
- see if it makes sense for SendScheduler to use the same RetryDelay as Reconnector
2020-04-22 20:54:17 +02:00
- DONE: finally adjust all file names to their class names? e.g. camel case
2020-04-18 19:16:16 +02:00
- see if we want more dependency injection
- for classes from outside sdk
- for internal sdk classes? probably not yet
2020-04-22 20:54:17 +02:00
thought: do we want to retry a request a couple of times when we can't reach the server before handing it over to the reconnector? Not that some requests may succeed while others may fail, like when matrix.org is really slow, some requests may timeout and others may not. Although starting a service like sync while it is still succeeding should be mostly fine. Perhaps we can pass a canRetry flag to the HomeServerApi that if we get a ConnectionError, we will retry. Only when the flag is not set, we'd call the Reconnector. The downside of this is that if 2 parts are doing requests, 1 retries and 1 does not, and the both requests fail, the other part of the code would still be retrying when the reconnector already kicked in. The HomeServerApi should perhaps tell the retryer if it should give up if a non-retrying request already caused the reconnector to kick in?
2020-04-29 10:10:20 +02:00
CatchupSync should also use timeout 0, in case there is nothing to report we spend 30s with a catchup spinner. Riot-web sync also says something about using a 0 timeout until there are no more to_device messages as they are queued up by the server and not all returned at once if there are a lot? This is needed for crypto to be aware of all to_device messages.
2020-05-26 10:30:30 +02:00
We should have a persisted observable value on Sync `syncCount` that just increments with every sync. This way would have other parts of the app, like account data, observe this and take action if something hasn't synced down within a number of syncs. E.g. account data could assume local changes that got sent to the server got subsequently overwritten by another client if the remote echo didn't arrive within 5 syncs, and we could attempt conflict resolution or give up. We could also show a warning that there is a problem with the server if our own messages don't come down the server in x syncs. We'd need to store the current syncCount with pieces of pending data like account data and pendingEvents.
Are overflows of this number a problem to take into account? Don't think so, because Number.MAX_SAFE_INTEGER is 9007199254740991, so if you sync on average once a second (which you won't, as you're offline often) it would take Number.MAX_SAFE_INTEGER/(3600*24*365) = 285616414.72415626 years to overflow.