Reporting outages across the League
A few hours ago, I just had my first unplanned outage incident on my node. This will definitely not be the only one that happens, and especially considering this incident was one that could have potentially impacted the ability of other nodes to federate with mine, I think there's an open question of how we should be communicating node outages across the League. Particularly:
Is this something we even need to do?
What constitutes an outage that other nodes need to be aware of?
Do we need a process for operators to follow in the event of an unplanned outage?
Should operators be expected to communicate planned outages (eg. system updates, data migrations) across the League?
Where should node operators communicate outages?
kouhai Mon 30 Sep 2024 9:55AM
@sirocyl oh that's in eventual wl-spanner scope
spanner (n):
- A hand tool for adjusting nuts and bolts; a wrench.
- One who, or that which, spans.
- A problem, dilemma or obstacle; something unexpected or troublesome (in the phrase spanner in the works)
sirocyl · Mon 30 Sep 2024 8:09AM
In this early stage, I'm not sure.
Eventually, in the future, we should have something on the League's end, that monitors node uptime on a regular basis, as well as validating node behavior to comply with requirements.
This would be an automatic sort of process, perhaps with an RSS and AP feed, e.g. "@nodewatch@broadcast.websiteleague.org", which posts and @-mentions node operators when a node has issues contacting/being contacted from wlorg's server.