How to beat lag when developing a multiplayer RTS game

rcme · on Jan 16, 2023

For an RTS game, you probably shouldn’t be sending unit positions over the network. Rather you should send events like “User A sent units X, Y, Z to position P at time T.” You can build your game as a simulation that plays back on both clients, so both clients compute the path of units X, Y, Z and locally play back the computer path. This allows you to perfectly recreate the state on both clients without needing to synchronize every single object. The best method of implementing this, that I know of, is called rollback net code.

AshleysBrain · on Jan 16, 2023

In the same blog series I point out that this seems to be an idea carried over from the days of modems and is no longer really necessary: https://www.construct.net/en/blogs/ashleys-blog-2/rts-devlog...

1000 units in intense combat can be synced with about 50 KiB/s - so why go to the trouble of deterministic gameplay, dealing with de-sync bugs, and making it diffiuclt to late join?

rcme · on Jan 16, 2023

> so why go to the trouble of deterministic gameplay, dealing with de-sync bugs, and making it diffiuclt to late join?

The biggest reason is to remove the need for frame synchronization. In fighting games, it's important to get frame-perfect inputs. Even in an RTS, frame-perfect inputs matter, e.g. when kiting enemy units. You also get features like replays for free. Late joining really isn't an issue because the server can reconcile the state, deliver it to a late-joiner, and then continue on from there. You don't need to deliver the entire history of events to a late joiner.

thworp · on Jan 16, 2023

It might not be necessary to get acceptable performance, but it is a fundamental countermeasure to cheating. If you send actual game state data over the network you make it very easy for cheat developers. If you add server-side sanity checks as a countermeasure, over time those checks will look more and more like the "old" architecture.

saurik · on Jan 16, 2023

There are tons of other ways to "cheat", including merely automatic behaviors faster than a human could ever do them (which is "within the rules" and yet is something people would consider "unfair") as well as snooping on internal hidden state (which is extremely common in RTS games due to "fog of war" mechanics", and yet would be needed for the simulation). Trying to prevent cheaters using technology is equivalent to Digital Rights Management and goes down the same paths of locked down computers with remote attestation and trying to use behavioral analysis to prevent unacceptable input... the entire concept ends up driving you insane at best and turning you evil at worst: if you suspect someone is cheating--whether online or in person--confront them, and, if you don't like their answers, just don't play with them anymore instead of insisting that cheating become structurally impossible.

thworp · on Jan 16, 2023

Of course nothing can fully prevent cheating. But from a player enjoyment perspective it is a lot worse (and more obvious) if your opponent can teleport his units across the map or make his workers mine 50% more ressources vs just automating actions (which is hard to distinguish from just being a very fast player) or having full map vision.

edit: The drm-like whack-a-mole is precisely the result of the architecture described in the blog, the "old school" way makes the most destructive and obvious cheating simply impossible.

AshleysBrain · on Jan 16, 2023

This has no bearing on cheating. You can do this approach and still have an authoritative server which clients only send their inputs to.

bigDinosaur · on Jan 16, 2023

Worth noting that this is a very common method and used by high profile RTS's such as Starcraft 2. It does require your simulation be deterministic, of course, which can bring challenges depending on what you're simulating. Supreme Commander had challenges with determinism and physics if I recall correctly due to CPU bugs, although they eventually worked it out.

eddsh1994 · on Jan 16, 2023

What happens if you cancel the event on computer A but it takes 500ms to reach computer B and in that time B has seen their troops and attacked?

Centigonal · on Jan 16, 2023

This post[1] talks about the mechanics of implementing this netcode. The scenario you're talking about would be a "desync," where in your simulation, the opponent never saw you, but in the opponent's sim, they did. You really don't want that, as that quickly makes the game diverge and become unplayable.

What most games that use this technique do is "lockstep" - they send out a chunk of commands called a "frame" several times a second, and step the simulation forward to step n only when they have frame n from every player.

So, in your scenario, if player A cancels before troops are visible to player B, then player B's computer won't ever show the troops, because simulation B will have paused during the 500ms of network lag, since it didn't have the frame from player A.

[1] https://news.ycombinator.com/item?id=34395153

rcme · on Jan 16, 2023

It’s worth noting that often lockstep is not appropriate. Fighting games, for instance, require frame-perfect inputs. Fighting games become hard to play perfectly if you introduce jitter into the timing by requiring frame synchronization. In a fighting you’d just unwind whatever happened and replay the simulation as if the “real” thing happened, which could mean undoing things a user saw in certain circumstances.

Centigonal · on Jan 16, 2023

That's true - I haven't seen lockstep used outside of RTSs.

comex · on Jan 16, 2023

Many fighting games did and do use lockstep; it’s only recently that rollback has started to become the norm, and some recent big releases still don’t have it. But fans actively clamor for rollback because it’s a far superior experience.

rcme · on Jan 16, 2023

You probably just accept that that’s possible. There are ways going to be conflicts in systems with distributed state. How you handle it depends a lot on the specific application and how often it happens.

Gigachad · on Jan 16, 2023

Then you can log the whole thing and implement replays.

asvitkine · on Jan 16, 2023

That's how Bungie's Myth worked.

jmyeet · on Jan 15, 2023

This is a surprisingly difficult problem. It's also what leads to a lot of cheats, dupes and hacks in online multipler games: you basically have to trust the client at some point.

The starting point for any such discussion is (as in this article) where all updates are done on the server. You then have to deal with packet loss and latency spikes but the real problem is subjective: it just doesn't feel good.

Imagine you're playing Fortnite and when you pressed your trigger you had to wait 50ms for it to acknowledge that your gun fired. That bullet has travel time so there may be another update if you hit someone.

So instead the client gives you immediate feedback and proceeds as if that shot actually happened. This may well include calculating if you hit the target. That target's position may be interpolated too, not the location you last got an update for. This feels way more responsive.

What if your target actually stopped moving after you shot. You get into an ordering canondrum. Now imagine if shooting that shot had recoil (which could affect both aim and position) and getting hit moved the target (eg getting hit by an explosive of some kind).

Doing all of this when latency can easily be >100ms and having it feel good is incredibly difficult.

charcircuit · on Jan 15, 2023

>You then have to deal with packet loss and latency spikes

And fake packet loss and latency spikes introduced by cheaters.

For example, when a cheater gets killed they could send a packet from the "past" saying that they killed the other person first.

no_time · on Jan 16, 2023

Ah memories. This was rampant in the late 00's. Soldering a physical switch on the ethernet cable to cut the internet at the most tactical times...

https://youtube.com/watch?v=5gE-ihY_EG0

hypertele-Xii · on Jan 16, 2023

Seems entirely trivial to notice the pattern of lag spiking exactly at the moment of the kill.

charcircuit · on Jan 16, 2023

You can do it with a constant offset. This allows the client to see into the future while taking actions in the past. If you learn that in 100 ms you are going to be headshot you can take actions while in the past to save yourself.

hypertele-Xii · on Jan 17, 2023

Anything that is "constant" is equally trivial to detect statistically.

To hide the fact that you're always lag spiking 100ms when you see an enemy, you'd have to lag spike randomly all the time, quite a lot, and then you're asking for a kick for low ping.

amelius · on Jan 15, 2023

Isn't this more or less the same problem as cooperative editing of a document?

actionfromafar · on Jan 16, 2023

I think it's exactly the same.

karmakaze · on Jan 15, 2023

I read some articles on synchronizing game state in networked games that had a specific name. I remember seeing some mortal-combat/killer-instinct videos though my interest is in RTS so probably discovered it that way. Oh yes, it's called Rollback Netcode[0]. Probably from this Stormgate status update[1].

This is much more complex in that the client and server share commands rather than state and they both have to be exactly deterministic in processing commands to get the new states. There's also computational overhead when rolling back then replaying forward which can be a lot if the game state is large. Searching "netcode" in reddit.com/r/gamedev could show other solutions like the one you've come up with. Netcode has been mentioned on HN in the past as well, though I don't remember--only found by searching.

Nice work BTW, on both the game progress and smooth playable sync.

[0] https://en.wikipedia.org/wiki/Netcode

[1] https://nichegamer.com/stormgate-first-rts-rollback-netcode

ilitirit · on Jan 16, 2023

Rollback netcode ("RBN") is a big thing in Fighting Games. One of the big problems though is that a large part of the community don't fully appreciate some of the technical challenges developers have to deal with when implementing RBN. They think that because GGPO (a RBN framework) can "easily" be used on old arcade/console games, it shouldn't be an issue to implement it in newer titles.

Well, that's not entirely true for a number of reasons. For example, if you look at a game like Street Fighter II, there are around 4 unique frames of animation for throwing a fireball. So any rollback that occurs there will hardly be noticeable. In newer games though animation frames are interpolated in different ways so you might have 10x more frames of "unique" animation. This makes rollbacks much more noticeable and jarring, especially in games with 3d models. Then you have the much trickier issue of audio rollbacks. This is even noticeable in games like SFIII:3rd Strike, where you might have the "KO!" audio effect play right as a rollback occurs, leading one player to think the round has ended before they realise a rollback occurred. For some extremely bad examples, search for some videos on Street Fighter x Tekken's, terrible RBN implementation. One of the ways developers get around these challenges is to add artificial delay to things like sound effects, adding more startup frames to moves, and adding massive amounts of hit-stun AKA "impact freeze" making games feel less snappy. For some games these things are not such an issue because traditionally these games rely on "slower" inputs and higher impact freeze because of the game's system mechanics, but in other titles the games just feel worse to play "offline" than their predecessors. For the most part, these are just "veteran player" issues. Newer players are not likely to notice them.

In any case, the bottom line is that RBN is not a panacea to online lag. Your game needs to be designed from the ground up to cater for it, unless you're willing to make certain compromises in online play.

vvanders · on Jan 15, 2023

There's broadly 2 types of state synchronization, lockstep and dead-reckoning. From the article it sounds like this is doing dead-reckoning where you let clients simulate locally and then resolve those differences on a continuous basis. It works really well in games where you're trying to "predict" what another player is doing at some future point in time but does worse for instant-reaction type game types. It also has the problem where bandwidth scales with each entity so it can get out of hand pretty quickly with a large number of entities.

The other approach(and the one most RTS games use) is called lockstep where you have a fully deterministic game simulation and clients send each other's input's to each other and all of them run in "lockstep" with each other. Generally this adds latency but has the benefit of scaling well to a large number of entities. It also requires your game simulation be deterministic which is extra fun when you have crossplay with different CPU architectures/OSes. If you've ever hit a "sync/divergence" error in those types of games that's usually a gamestate checksum that failed and bailed(some games were smart enough to re-sync the gamestate which is similar to host migration in P2P titles).

Rollback is a bit of a hybrid of both where you run closer to dead-reckoning(I.E. clients simulate all entities) but as inputs from other clients come in the simulation is re-wound to that point(I.E. "rolled back") and then re-run with the new inputs + some smoothing. It works really well for reaction based games which is why you see it widely used in fighting games and an early version of that was used in many FPSes for critical things like hit-tests(with fun byproducts of getting "warped" around a corner in high latency situations if getting hit changed things like movespeed).

Networking in games is an absolute blast, you have the fun technical problems outlined above but there's also an aspect of psychology/game design where a lot of what you're doing is "masking" latency in a way that's not visible to the user. A simple example here is playing hit sounds/effects locally but resolving them server-side. The 100-200ms latency isn't noticable if there are things happening client-side. From the game design side you can have games where it's about predicting where an action will happen 200ms-1000ms+ in the future which is much more latency tolerant. It's how games like Subspace[1] back in '99 was able to do 100s of players with 250-500ms latency and still have a high fidelity game since it was all about prediction instead of twitch reaction.

[1] https://en.wikipedia.org/wiki/SubSpace_(video_game)

twodave · on Jan 16, 2023

I think I probably wasted half my teenage years on Subspace/Continuum. What a blast that game was! Not to mention the diverse player base. I noticed they even listed on Steam a few years back, but being 38 with 4 kids doesn’t lend much time to such things anymore.

j1elo · on Jan 15, 2023

The 3 problems described here (latency, jitter, and packet loss) are the same exact problems that any real-time UDP application will face. The WebRTC spec for video streaming uses and extends the RTP protocol [0] to deal with all those.

Presented solutions include jitterbuffers to account for packets that arrive late or out of order; synchronized clocks; stream timelines... all those also exist in any typical WebRTC application, plus others such as packet retransmission requests, or even detecting available bandwidth on each client (to adjust video bitrate accordingly).

Of course, RTP is originally meant for video transmisison, but I wonder how much of the typical RTP stack (like that from GStreamer or FFmpeg libraries) might be reusable and useful as a baseline for implementing the online part of a videogame.

[0]: https://www.rfc-editor.org/rfc/rfc3550.html

fao_ · on Jan 16, 2023

Further notes here:

https://www.reddit.com/r/gamedev/comments/cgvqr5/anyone_use_...

https://hacks.mozilla.org/2013/03/webrtc-data-channels-for-g...

http://blog.brkho.com/2017/03/15/dive-into-client-server-web...

j1elo · on Jan 16, 2023

Note that while still quite complex and difficult to grasp for newcomers, WebRTC is now much more stable and well implemented by web browsers. However bugs and interoperability issues have been rampant still in latest years, so I don't want to think about how the scene was in 2013...

Regardless, in my above comment I was thinking of a lower level. Not WebRTC per-se, but the plain RTP protocol.

Every RTP (UDP) packet has NTP timestamps included in the header. RTP endpoints (both client and server) have a well defined set of calculations user to synchronize packets based on their timestamps, and also infer network latency, jitter, even clock skew, that kind of things.

RTP packet headers also have a sequence number. This is used to detect out of order, or missing packets. Then the packets can be reordered, or (if it makes sense) a retransmission request be sent back to the sender.

I.e. a lot of the problems that were being described in the original article, seem to be already handled by RTP.

So I was thinking of how realistic it would be to use multiplayer game data, instead of video frames as the RTP payload, to leverage all these network-related mechanisms of the protocol.

loxias · on Jan 15, 2023

Very nice. Hadn't seen such a clear explanation of PDV and the solution with synthetic delay before. I've designed several latency critical protocols using these techniques and some that build on this, but haven't before seen such a clear explanation. My own "slides" were harder to follow, when I had to explain once how the protocol works.

Could have saved me months of thought 10 years ago if one had existed. ;)

It's always validating to see someone else arrive at similar conclusions when facing a challenging technical problem. To the author, I encourage you to keep pushing (if you want to), it's possible to replace the fixed synthetic delay with a control loop. :D This makes both the PDV _and_ the system latency approach their minima over time.

vore · on Jan 15, 2023

I don't think there's usually need to have a separate ping timer: just have each client send the sequence number of the last received packet in their response, and the receiver can subtract what they think the game state should be at vs what they're really at to determine the difference in time every packet without having to only be able to sync up the time every ~2 seconds.

forrestthewoods · on Jan 16, 2023

I think it’s important to note that it’s impossible to “beat” lag. The only thing you can do is hide lag in different places. There are a bunch of solutions that all have different trade-offs.

This solution is in roughly the same space as one I wrote about a long time ago. https://www.forrestthewoods.com/blog/tech_of_planetary_annih...

pugworthy · on Jan 16, 2023

Agreed - you can't beat it.

I guess some day we can have quantum entangled FPS games so it's all instant and there is no lag? I only sorta half joke.

gamblor956 · on Jan 15, 2023

The old trick (circa AOE2 and SC) is to delay execution of the user input long enough to account for most latency in propagating input to other players (usually 100 to 500 ms), and to hide the delay through audio or visual acknowledgements of the input. Input was executed on "ticks" which did not directly correspond to time.

10000truths · on Jan 15, 2023

You don’t even necessarily need to mask the latency. You can just accept inputs at a low but consistent tick rate, and players will be able to adjust as long as packet loss and jitter are minimal.

dpeck · on Jan 15, 2023

A “classic” paper in the space that might give some historical context to anyone interested, https://www.gamedeveloper.com/design/the-internet-sucks-or-w...

karmakaze · on Jan 16, 2023

This just showed up "1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond"

[0] https://news.ycombinator.com/item?id=34395153

Animats · on Jan 16, 2023

I am painfully aware of this. Here's my "Merrily we bump along" posting.[1] As environments and simulation get more realistic, this gets harder.

It's especially tough for big-world systems where not all the users in one area are anywhere near close physically. Sharded systems can put all the users on servers near their location, but big-world systems don't have that option.

[1] https://community.secondlife.com/forums/topic/451190-merrily...

tareqak · on Jan 15, 2023

Wouldn’t it be better for estimated latency to be calculated for each direction?

The ping message could have the client’s current time, and the pong message would have this client time and the server time.

a_t48 · on Jan 16, 2023

That doesn’t help - you don’t know the difference between the two clocks

gpm · on Jan 16, 2023

Similar to the problem of the one-way speed of light: https://en.wikipedia.org/wiki/One-way_speed_of_light

gxs · on Jan 15, 2023

As someone who plays fps games, I always have two responses when I see some BS happen during the game with things like lag, hit reg, etc.

As a consumer, my first reaction is fuck activision this is bullshit, bunch of incompetent dimwits.

The flip side however, as someone who knows better, sympathizes and says, wow it’s amazing this works as well as it does.

Awesome article, especially for anyone who plays games and does technical work.

pmalynin · on Jan 15, 2023

Trailing State Synchronization