A Distributed Multiplayer Game Server System
Eric Cronin, Burton Filstrup and Anthony Kurc
Introduction: Real time multiplayer games require low latency network transport and a high degree of game state consistency. This combination is extremely difficult to achieve as messages can be delayed indefinitely in the network. The authors propose a Mirrored Server architecture, a trailing state synchronization and a low latency reliable multicast protocol to achieve these requirements.
Description: Current game servers have a Client Server architecture, while a few operate in a peer to peer way. The Client Server architecture offers the advantage of being simple to implement, consistent and provides a single point for administrative control to the gaming company. But is also has some disadvantages in terms of low scalability and latency problems. While the latency is significantly low in peer to peer systems, maintaining game state consistency and control are very hard. The authors, in this paper, propose a Mirrored-Server architecture to address this problem.
Based on the assumption that the gaming company owns a low latency private network, where several mirrors act as gateways to the network and each client is connected to one of these mirrors, the authors propose a system to address latency and consistency problems. The mirrors use the low latency private network for fast communication between each other. This is made reliable by the Clocked Reliable Interactive Multicast Protocol proposed in the paper. The Trailing State Synchronization, also introduced in the paper, allows for game state consistency between mirrors. The Mirrored Server system is scalable because clients are connected to the Mirror servers, more of which can be added if necessary. It also solves the latency problem because of the low latency private network, while the gaming company can have absolute administrative control of the mirrors.
The Trailing State Synchronization is a novel method for synchronization of game state, introduced in this paper. The game chosen for the proof of concept by the authors is Quake. With a fast paced First Person Shooter game like Quake, the rate of commands issued by the user with reference to time is very high. This prevents synchronization mechanisms such as Timewarp which maintains multiple copies of the game state for each executed command. This problem is addressed by maintaining more than one executing parallel game states with the leading execution having no latency and the rest of the executing states each running with a delay of a few milliseconds from its preceding states. The parallel execution synchronization is similar to the Bucket Synchronization with different delays. To detect inconsistencies, each synchronizer looks at the changes in game state that an execution of a command produced and compares it with the immediate preceding state. If inconsistency is discovered, a roll back from the trailing state to the leading state is performed. This method allows for a high degree of consistency, while allowing low latency and scalability.
The third element of the Mirrored-Server system is the CRIMP protocol. The requirement for a low latency performance has prompted the authors to introduce a receiver-based reliable multicast layer which conforms to the requirements of the architecture. Several other enhancements to increase the performance of the multicast layer is also introduced. In the receiver based protocol, the receivers detect losses and send a recovery request, which is responded to by any host that has the packet. By tweaking certain variables (such as the probability of generating a response or request etc) the protocol is optimized, allowing an efficient communication mechanism with minimal overhead. The layer also has provisions for boot strapping to allow new mirrors to join, loss detection, cancellation of recovery and server management capability.
Conclusion: The synchronization mechanism was tested by the authors through simulations. However, an accurate study of TSS with different configurations was not done. The CRIMP protocol was also put through a very basic test. Using a simple virtual topology, a two-way Free BSD Dummynet bridge along with a delay of 25 ms and a variety of packet losses, the multicast layer was evaluated in terms of perceived RTT, total losses and duplicate ACKs. Although the results are fairly positive, significant amount of testing still needs to be done before any conclusion can be derived.
Discussion: The analysis of the paper resulted in some interesting questions and discussions.
An interesting point that could be a source of some confusion is if the delay in synchronization actually causes the execution to go back in time when a roll back needs to be performed. Although it seems like this could be true, it is not so. Upon roll back, all the commands from the trailing state that identified the inconsistency up to the current execution time (in the leading state) is re-performed after copying over the trailing state to the leading state. Thus, the leading state remains in the current execution time, but after having executed all the commands in the correct order.
Since the roll back is done during the game play, while the user is playing on the leading state it is interesting to understand if it is perceptible to the user. However, this depends on the delay of the trailing state whose execution identified the inconsistency. If the delay is under 100ms it could result in the change being imperceptible to the user. With delays of more that 100ms it is hard to say.
Several protocols exist that addresses the synchronization issues in files for distributed systems. It was found that these protocols were not looked into by the authors and that could be because the time line required for file consistency is orders of magnitude higher than in the case of real time games.
Are there any limits to the number of mirrors that can exist in the system? Because a limit here would affect the scalability of the system. Theoretically, there is no limit to the number of mirrors that can exist. However, there could be some limits imposed by practical situations such as the bandwidth available etc. The tests performed by the authors are inadequate to derive any conclusions.