Infrastructure
Cloud Infrastructure
Our cloud infrastructure leverages scalability, multi-level redundancy, and failover options across Amazon Web Services in the Stockholm region of Sweden to reduce latency, maintain reliability, and scalability. AWS data centers are designed and optimized to handle business-critical applications and have multiple levels of built-in redundancy.
Lynes Infrastructure is constructed in such a way that in a disaster situation, the entire service can be set up from scratch in new data centers within a relatively short period.
Availability and Redundancy
Structurally, the Lynes application is based on an advanced architecture with hundreds of so-called micro-services. Each type of service is stateless, allowing the service to have multiple copies running in parallel. This provides a stable form of redundancy, meaning that if one copy crashes, there are replacements that can take over while the crashed copy is automatically restarted. Another consequence of this architecture is that the crashed service only affects a small part of the application as a whole, usually a certain function, while everything else operates as usual. The effect of such a single service crash means that only a few end-users experience a barely noticeable temporary disruption in a specific function.
Certain critical service types have built-in autoscaling, meaning the number of copies automatically adjusts based on existing load, with a guaranteed minimum level when the load is at its lowest. This provides security if the load increases unexpectedly for some reason.
Operator Connections
There are two independent dedicated fiber connections to each mobile operator, routed through a subcontractor that operates Session Border Controllers at two different physical locations (geographic redundancy). This means we have redundant connections to all mobile operators.
The operator connections use Round-Robin technology, meaning about 50% of the calls automatically go via one connection and the rest on the other connection. Additionally, both connections are always scaled in such a way that if one goes down, the other can take 100% of the calls.
Performance
We constantly seek ways to improve product and platform performance by monitoring key performance indicators, such as CPU, system load, memory usage, database load, audio quality level (average MOS), etc.
Quality Assurance
For further development of the lynes platform, four different service environments are used. In addition to the operational environment used by end-users, there is also two separate so-called Staging environments used for testing and verification of new features and changes. Each developer also has their own development environment.
Before any changes to the application go live on the operational environment, the change is tested and quality assured in the staging environments. Both automatic unit and system tests are used for testing and verification, complemented with manual tests.
Before code progresses from a development environment to the staging environments, it must be reviewed and approved in a review process by at least two different developers. In addition to the purely functional review, the change is evaluated from a scalability and security perspective.
Upgrade
Most service types can be upgraded in full operation and completely unnoticed by the end-user. Regarding upgrades that concern changes in the app’s user interface, version management can be done, enabling a gradual transition to a new version. In this way, the new version can be verified and quality assured by lynes internally before it’s “pushed out” to all end-users.
As for the telephony servers, new instances running the new code can be started at any time, while the old instances are set in so-called drain mode, and first emptied of ongoing calls and then shut down.