GameDev Log #1: "Grimoire"
Aug 03, 2020
For the past two months, I've been working on my own game engine built with Rust instead of C or C++. It's a decent excuse to also start a devlog to chronicle my progress and journey. I'm new to game programming in general, so this devlog series will likely catch a bunch of mistakes along the way. That being said, if you've got questions or want to correct me, feel free to email me and let me know!
Writing a game engine is no simple task; the common and well-established wisdom is that independent developers who are seriously considering developing (and shipping!) a game are better off picking one of the leading game engines out there in the market, e.g., Godot, Unity, Unreal, etc. instead of building one from scratch.
I am ignoring all of these warnings. I'm crazy enough to give it a try.
Why?
- Game engines are extremely performance-driven. So much low-level knowledge of how a game lays out its code and data in memory, both on the system and in GPUs, and in multithreaded contexts, is critical to achieving the smoothest experience for your would-be gamers. If any experience would give me the opportunity to go deep, figure this stuff out, and learn a ton while having fun, writing a game engine definitely would.
- Games are cool. The work I do in my day job is endlessly fascinating and gives me a lot to think about, but in my heart of hearts, I'm a gamer and I love games: board games, video games, game design, anything. Writing a game engine gives me an excuse to work on something I absolutely love, even if the engine work requires a fair bit of linear algebra, strong concurrency skills and an eye for architecture.
- I like a good challenge. Call me arrogant if you'd like, but I believe that I can do it!
What is a game engine anyway?
A game engine is the coding framework that allows a video game to function on whatever platform (PC, Console, etc.) it was written for. As a concept, the game engine was born from game developer's need to separate their games' low-level and core components from its art assets, music assets, game-specific logic and data.
For example, a game engine might have a platform layer with a custom API for interfacing with a variety of devices: for rendering a window to a screen, for receiving inputs from a game controller vs a keyboard and mouse, for allocating and deallocating data in memory, for playing audio through speakers, whether those speakers exist on a TV screen or on a surround sound system, etc.
Game engines will often also include other core functionality, such as rendering images, simulating lighting effects, simulating physics like gravity and collisions, etc. For games played over the internet, the engine would be in charge of all that difficult-to-get-correct network code, i.e., synchronizing data with all gamers participating in the same world, and if possible, preventing the possibility of cheating.
Introducing: Project "Grimoire"
The working name for my game engine is "Grimoire". After all, what's a game engine if not a bunch of black magic behind the scenes that makes videogames look, feel, and work great?
Before anyone asks (not that I would expect anyone to be interested in using it), I've chosen not to open source my game engine. I will still share a lot of its code here in my blog in the form of snippets and explanations, but I want to keep open the chance for me to make a commercial game with this thing someday.
Grimoire used to be codenamed "Bulwark"
I started a Trello board to keep myself organized. Even within the past two months, I've set this project aside for a few days and then after getting back to it, I'd have to remember exactly where I left off. Plus, the stories/ticketing system + swim lane UI just works better for me.
Out of the Loop
The "Game Loop" is the heart of any game engine. There is almost always at least one infinite loop somewhere in the code. After all, if it's up to the gamer to decide when to shut off the game and stop playing, your engine has to keep the whole program alive until they decide to do that.
A lot happens within the game loop, but it can be over-simplified into three groups of functions like so:
Over-simplified game loop
You'll notice that the third item there is a "Render". Every iteration through the loop constitutes a single frame of the game, which means "time" in your game world progresses in discrete snapshots of images, one by one, to your viewer. If you can get this loop to run fast enough such that it completes its iteration 10 times per second, you would have achieved the minimum requirements necessary for the human eye to observe motion. Any gamer knows that 10 frames-per-second (FPS) has an awful game feel; these days, anything less than 30 FPS drastically reduces your games' playability, with a common expectation that most games should really operate at least at 60 FPS at a minimum.
The math works out such that as a game engine programmer, you need to write code that can complete the game loop in less than 16.66 milliseconds. Always. No matter what is going on in the game: whether the player's character is leisurely strolling through a quiet idyllic landscape, or literally in the middle of a twitch-speed combat sequence with hundreds of particle effects, enemies, projectiles, and boundaries all on screen.
So, okay. 16.66 milliseconds. How hard can it be? Let's dive deeper.
Just tell me what to do!
User input is what makes a game a game instead of a movie. It is interactive. All games will have to handle user input in some way. So how do we do it?
Let's chart a bunch of frames in a graph with time, t
, extending out towards infinity. For now, t0
represents the beginning of time for your game: the moment that your gamer booted up the program. t1
is a point in time some arbitrary number of milliseconds in the future from t0
. Same goes for t2
, which is some arbitrary number of milliseconds following t1
. Each of these points marks the completion of just one iteration of our game loop.
User inputs can come from a variety of devices. On PC, it could be a keyboard or a mouse. On a console, it could be your game controller. It could be a joystick. Some controllers provide gyroscopic data that tell you in what orientation the controller is being held or whether it's being swung around or held still, e.g., Nintendo Switch's Joy Cons, or the Nintendo Wii's remote controller.
Game engines need to sample from these devices to acquire data on what buttons are currently being pressed and when they were pressed (if any). Since the update
phase of the game loop relies on a clear picture on what the user has input, the sampled data is often collected/collated and evaluated together for each frame.
Which means that inputs are often handled one frame late. In other words, game controller data collected in the time interval between t0
to t1
will be observed at the earliest at time t2
, which follows after the rendering step in the t1
to t2
interval. If the game is progressing fast enough, 16.66ms is typically too quick to notice the delay.
Grimoire's Input Handling Implementation
Grimoire is built using Simple DirectMedia Layer 2 (SDL2), which does a lot of things, but relevant to this discussion, it provides me with a low-level access to controller inputs.
SDL2 is written in C, but there are bindings available to use it with Rust, aptly named rust-sdl2
.
Using SDL2, we get access to something called an EventPump
, which is basically a queue containing Event
s from the operating system/application, including user inputs. You can poll from the EventPump
and read the events in sequential order, processing each user input event in turn.
To use a simple example, if the gamer presses the spacebar and immediately releases it (like a tap of the key), the EventPump
will contain a KeyDown
event with the spacebar's Scancode, followed by a KeyUp
event with the same Scancode. In contrast, if the gamer pressed the spacebar and held it down, we would see a single KeyDown
event with no KeyUp
event.
To store the results, Grimoire defines a massive struct to store all of its details.
pub struct InputState {
w: KeyboardButtonState,
a: KeyboardButtonState,
s: KeyboardButtonState,
d: KeyboardButtonState,
spacebar: KeyboardButtonState,
// every other keyboard button, mouse button, etc.
}
pub struct KeyboardButtonState {
pub pressed: bool, // key was pressed during the current frame
pub held: bool, // key was still held by the end of the frame
}
On each frame, we pull out all of the Event
s currently in the EventPump
and iterate over each one. If we see a KeyDown
, we'll toggle the pressed
flag to true
. If we see a KeyDown
without an accompanying KeyUp
, we toggle the held
flag to true
. On subsequent frames, we'll reset the pressed
flag to false
and allow the held
flag's value to persist onto the next frame, since it's very likely and possible for the gamer to hold the button for multiple frames at a time.
It kind of looks like this, but not exactly:
fn process_events<T>(state: &mut InputState, events: T)
where
T: IntoIterator<Item = Event>,
{
state.reset();
for event in events {
match event {
Event::KeyDown {
scancode: Some(scancode),
repeat: false,
..
} => {
match scancode {
Scancode::W => state.w = true,
Scancode::A => state.a = true,
Scancode::S => state.s = true,
Scancode::D => state.d = true,
Scancode::Space => state.spacebar = true,
// much more stuff for the held flag plus all other keys
}
}
Event::KeyUp {
scancode: Some(scancode),
repeat: false,
..
} => {
// do some stuff
}
}
}
}
The downside to this approach is that if somehow the gamer was able to tap a button twice in a single frame, we would only register that as a single tap. I found this to be unlikely though - and for the game I have in mind of making, such granularity won't matter, so handling this edge case won't be a problem.
The other challenge to this approach is that it does nothing for evaluating cross-frame input patterns. For example, let's pretend we were making a Street Fighter clone. If we wanted to detect when the gamer entered a combo like Down + Right + Fierce, it would have to occur across multiple frames. Expecting these keys to be pressed on literally subsequent frames might be too tight/unforgiving to the player. We might want to allow the second and third key to be pressed within a window of frames instead of the immediate one.
For now, I've punted that kind of input handling for later, leaving it for the game logic to evaluate instead of the engine. Something for "Future Chris" to handle.
What about Mouse Events?
If the mouse worked like a game controller, i.e., a click was equivalent to a keypress, then it would work the same as the above. This is pretty common in first-person shooter games where a left mouse-click often means "fire gun" regardless of where the mouse cursor is located.
If we wanted to actually care about where the user was clicking, like in graphical user interfaces (GUI) such as a menu screen, inventory item screen, dialogue screen, etc., it gets much more involved.
2D screen coordinates
Your game screen window is a two-dimensional surface with a finite size measured in pixels, using x
for width and y
for height.
SDL2's EventPump
emits mouse clicks as MouseButtonDown
and MouseButtonUp
Events that each include the (x, y)
coordinates for where the cursor was located at the time of the button down or button up.
In case it's not super clear already how it works
Grimoire provides the engine user (me) with a way to register clickable UI buttons on the screen with a known position and size. With this information, when a user clicks with their mouse, we can compare the click location with the location of any and all clickable UI buttons to determine whether the user actually clicked a button or nothing at all.
First, to detect whether a successful click occurred, every Clickable
object registered with Grimoire
must have a defined BoundingBox
defined. The Clickable
objects are registered in an Entity Component System (ECS) called specs
, which will allow me to very easily access them all again later. ECS are a huge topic to get into here, so I'll avoid it for now. In any case, the Clickable
and BoundingBox
structs looks like this:
pub struct Clickable {
pub bounds: BoundingBox,
pub action: Action,
}
pub struct BoundingBox {
pub dimensions: Vector2D, // height and width of the box
pub offset: Point, // the top-left corner of the box
}
Second, I implemented a bare-bones Axis-Aligned Bounding Box (AABB) collision detection function that can return a true
or false
value for whether some point (the click location) "collides" with some BoundingBox
.
fn is_aabb_colliding(bounding_box: &BoundingBox, rhs: &Point) -> bool {
bounding_box.iter_over_paired_dimensions().zip(rhs.iter()).all(
|(rect_coordinates, point_coordinate)| {
let (min_rect, max_rect) = fast_pair_order(rect_coordinates);
point_coordinate >= min_rect && point_coordinate <= max_rect
},
)
}
#[inline]
fn fast_pair_order(nums: (f64, f64)) -> (f64, f64) {
match nums.0 < nums.1 {
true => (nums.0, nums.1),
false => (nums.1, nums.0),
}
}
Finally, using my ECS mentioned earlier (the details of which I'll avoid for now), it was fairly straightforward to call this colliding check for every click.
// ... ellided ...
(&clickables).par_join().for_each(|clickable| {
if is_aabb_colliding(clickable.bounds, &click_location) {
// the Clickable was clicked! time to do something here.
}
})
// ... ellided ...
In general, I hope I've described the kind of concerns you have to deal with when processing user input. I haven't even gone over additional challenges like the following:
1. What if you want to present the user with a free-form text input box?
2. How do you handle what should happen once a given button is clicked? A common approach is using a function callback, but it's possible for the callbacks to need to have mutable access to the World/Game state, which can problematic with Rust's borrowing and ownership rules.
3. What about gyroscopic data? Is it handled the same?
This post is getting quite long so I'll cut it short here for now. In my next post, I'll talk about the other two steps of the game loop: the update
and the render
.