With more content in the game (units, houses, terrain textures) the game loading got noticeably slower. Being able to launch the game and check the changes as quickly as possible is critical for early development. It disrupts the workflow a lot when I need to restart the game to check out changes made in code, dozen of times in a row. Optimizations had to be applied.
There are hundreds of places that could be optimized in Alpha version of a game. Many of those optimizations are going to improve the performance just a bit, many will take a lot of time and complicate the code. The aim is to find the “best” optimization – one that takes least time to implement, desirably simplifies the code (yes, that is possible) and brings best performance improvement.
Proper way of dealing with this case is to profile the game. But since we talking only about the initial loading time, we can get with a much simpler approach – logging. For that the game is instrumented with functions to write each loading step timestamp into log.
- Cursors resources loaded – timestamp.
- Fonts resources loaded – timestamp.
- Houses resources loaded – timestamp.
- you get the idea
It revealed that on average full game loading from exe launch to playable mission takes 13.4 seconds. 8 seconds of which are spent loading PNG textures for units, houses and terrain (total of 95 images with loading time raging from 5 to 1800 msec). This is the target now.
I have little control over PNG library that loads the textures, so switching to another one could be a thing top try, but PNGs are known to be slow, so I did not spend time on that. There are several viable options still:
- Change textures format to one that does not require lengthy unpacking and could be loaded on to GPU quicker.
- PNG takes a lot of time to unpack. It’s advantageous to use all available CPU cores and parallelize loading to run in several threads.
- Make a special build with much smaller textures.
First option is best for release version of the game. It takes time to convert textures and they can not be changed easily after that. Loading times are best.
Third option is obviously worst alternative, since it breaks the game art. It could be used for simulation runs probably (e.g. runs where AI gets tested).
I chose second option because it does not make working with textures any different that it is now. Also it is an interesting task (multithreading) that I have little experience with. It will be possible to apply to other tasks (mainly keeping models loading in mind, but other applications are viable too).
Keeping generic requirements in mind (that designed class should work for textures, models and other yet unknown applications with minimal changes) I came to the following simple layout:
- The game is loading resources by domain (houses, units, tiles, etc) instead of being split into types (models, textures, animations, etc.). Textures get collected from each step into a list as the game parts are loaded.
- Once it’s done, there is a flag choosing between old approach (load everything in the main thread one-by-one, or a new multi-threaded way). This way I can safely implement new algorithm while always being able to switch back to the old one if anything goes wrong or needs checking. Also compare loading times precisely.
- Then there is a worker thread. A thread that requests a piece of work by reporting to its owner, if there’s no work – it destroys itself. One worker loads one texture at a time.
- And there is a pool thread. A thread that keeps a list of tasks to do and workers to take them. Pool thread gets CPU cores count from OS and makes this many worker threads.
Why making the pool in a separate thread? It allows to run it in the background, for example reloading textures on-the-fly without freezing the game (not tested yet).
Once implemented, this cut down textures loading time from 8 sec down to 2.6 sec. Quite nice if you ask me. But of course I plan to add more and more models and textures and other kinds of content, worsening loading times with that. So at some point in time I will have to repeat the process and find another “best” optimization 🙂