Godot Engine (godotengine/godot on GitHub) is an open-source, primarily C++-based game engine. It supports both 2D and 3D. As a C++ programmer, I decided to generate a dependency graph of the #include “…” and try to determine the most important files in the graph.
By doing so, I hope to uncover what files are important to look at for beginners who want to contribute to Godot.
I used cinclude2dot2, a Python-based script, to parse the C/C++ source files directly from the Godot source files from its GitHub repository and detect lines with file inclusions. The data was manually cleaned and stripped of its directory paths, leaving only the filenames. This means that files that have duplicate names are merged into one single node in the graph. This is undesirable.
I used Gephi to perform the visualizations and all of the metric calculations, running OpenOrd and ForceAtlas2 on the graph with various settings.
First of all, let’s introduce a few concepts.
In-Degree is how many times a file is included by other files.
Out-Degree is how many times a file includes other files.
Degree is the sum of both In-Degree and Out-Degree
PageRank is the famous Googley algorithm for ranking web pages, but it can also be applied to graphs, in general! Here, we use it to rank files in the dependency graph.
First of all, according to our ranking, we can see that the configuration and access setting files config.h and project_settings.h are quite popular. Unfortunately, config.h is an amalgam of 5 different files at this time, but all reside within the thirdparty/ directory:
So, we can conclude that Godot (or, rather, code that Godot uses) is focusing on these libraries quite a lot. Nothing too big, but possibly something we may want to break up later if we want a closer look at it.
os.h is quite an important file, as it is a class that deals with changing video modes, querying the operating system for memory usage, and so on and so forth.
Strangely enough, important classes like Object are not showing up here! In-Degree is quite a naive measurement for important of nodes in a graph — you’ll see better ones later.
We can interpret out-degree as “the most complex files to understand” if we assume that we must understand the content coming from headers that it references. It is fitting, then, that the Godot editor, and scene manager show up in this list. However, the large numbers become less scary if we consider that register_types.cpp is actually the name of over 30 distinct files…
You may have noticed that certain files with the “bt” prefix keep coming. What is this? This is the Bullet physics engine that has been used in many open-source projects, as well as in commercial projects (even movies!) It makes sense that Bullet files show up quite highly here, as physics and collision are no simple tasks to accomplish and understand.
PageRank is an algorithm that derives a node’s weights based off of the weights of other nodes, favoring connections that direct to the node rather than away from the node. Thus, PageRank not only creates a more “global” metric, but it also favors nodes on the receiving side of a dependency reference.
So, perhaps, when one is going through Godot, it would be helpful to keep an eye out for some of these! Odds are you’ll encounter a header file on this list, and odds are that most other people will too.
Centrality metrics are great for figuring out what nodes are at “the core/central point” of the graph. Here, we can finally see that Object, “the base class for almost everything,” takes a fitting place at the top of the ranking — even the official documentation thinks its important! Variant, “the most important class in the engine,” also shows up at #2. One result that isn’t well-documented with a custom guide for it within the documentation is the GUI-focused class, Control — control.h is influenced by and influences a large amount of the codebase, which is fitting if we consider that Godot Engine’s editor is built within its own GUI framework.
Sidenote: I personally think that Godot’s feat of building its own editor as a Godot game is quite a cool and unique thing! I think this deserves more exposure.
editor_node.h and node.h seem to form the backbone of the editor and the scene engine, respectively — they are, perhaps, prime candidates for study for getting an intimate look at Godot.
Many of the classes here hint at fundamental systems such as imaging, the main engine loop, or the editors themselves.
In Gephi, I ran the modularity detection to attempt to cluster the Godot codebase into different clusters, except, this time, I decided to weight the node size based on betweenness centrality because it tended to agree with the official documentation’s placement of importance on Variant and Object. Below is the new picture, with only the 7 largest clusters colorized:
Cluster 1: Core and Editor (26%)
Cluster 2: main, config, and libraries (14.53%)
Cluster 3: Object, Variant (14.11%)
Cluster 4: Bullet library (9.2%)
Cluster 5: libvpx (6.75%)
One interesting thing we found out about Godot is that a significant amount of its codebase is not even particularly a part of Godot itself. There is at least 10-15% of Godot engine code within the repository that one can expect to be fairly isolated but still heavily referenced by other libraries — perhaps worker who begin to encounter these high-influence files may find themselves having to learn these libraries to push their development to the next level.
We have revealed that a majority of the code in Godot is rightfully influenced by and influences many of the files under the core/ directory, and have identified some important candidates that may be suitable for further documentation. This analysis confirms the notion that Variant and Object are two extremely important classes in the Godot engine, though there are many other important ones which influence their areas of the codebase quite significantly as well.
Many, but not all of the pictures and data is included in the zip file below:
I’ve been trying to read papers for awhile in a few places in computer science, and I’ve come to the conclusion that trying to understand the “big picture” in research is sort of a mess.
I don’t have any evidence, but it’s just a feeling.
In my network science class, I came across a piece of evidence. Barabasi, a well-known researcher in network science, in his personal introduction to his textbook Network Science, explicitly says that “I still find puzzling how disjoint were the communities that were thinking about networks before 1999.”
I keep asking my friends: “Isn’t there some lovely visualization or navigation tool that can help me understand research in general?” Besides some of the other research tools, I can’t find a great answer with great consensus. So it appears that the practice of research aggregation lacks centralization of methodology — there’s no norm for how to do it…
Do we just need to figure out a norm or create a tool to help us do this? My shades are tinted right now, but I bet graph theory could be a great start for a foundation of what papers are important… Hm…
Disclaimer: at least at my school. And in years of lurking the Internet.
#1 – Libraries (and making them too)
Most people will NOT teach you libraries. Anything from Qt in C++ to Django in Python is going to be less likely than assignments to roll your own. Of course, some fields, like graphics and compilers will use more standard tools and libraries (OpenGL, ANTLR/yacc/bison).
Now, making your own library? I’ve rarely seen that being emphasized in discussion on the Internet but you will surely write code that will be used by somebody else. In a way, many people make libraries for each other everyday — it’s just not the primary idea.
How do I learn it?
Create your own. Look at others. Documentation is about 50% of using a library, so writing it should be 50% of making a library. Look into your language documentation libraries, like Doxygen for C++ and Javadoc for Java.
If you think about it, operating systems are like the O.G. libraries. They’re just such a part of the system that they have their own club now.
#2 – Build Systems
I find it strange how people do not cover build systems as a first-class…. well… class (double pun). It should deserve it’s own semester class, in my opinion. How many times do you use a build system and you are required to understand your dependencies and how they fit together? In my short experience in working, build systems are everything, from the compilation and customization process all the way up to deployment and support.
How do I learn it?
C++ has no good universal build system. The closest one is CMake or Visual Studio. Solutions like Meson are up-and-coming, but often have larger dependencies like Python.
Java is probably the best example of having build systems with Maven and Gradle. It’s easier to integrate with libraries since repositories are well-established and JVM is a cross-platform blessing.
Start with a continuous integration tool and actually use it — TravisCI, Circle CI, and Jenkins are popular for open-source.
#3 – Rigorous Testing and Validation
Testing and validation may be covered, but people will not usually go into the differences between dumb mocks and spies and why you should care. It appears that testing is often a big part of the deployment process for larger companies. Additionally, static analysis may be used at more careful organizations in aerospace and hardware work. The balance between the perfect ideals of correctness and soundness and “just work, damn you!” is something that you’re not often to get formally trained in.
How do I learn it?
Learn the difference between unit and integration tests, and if you’re headed for an OOP place, maybe they’ll be using something like Google Mock/Google Test or EasyMock/Junit.
Explore standard debugging tools and some basic static analysis tools. For starters, try Cobertura and FindBugs in Java. If you’re really interested, learn a little more about your language’s type system and the benefits and downsides it has.
#4 – Critical Diagnosis and Debugging
Nobody teaches you to how to diagnose a bug definitively because we’re not there yet. The tools vary too much across industries — heck, even debugging formats are not standardized across platforms, and if they are, they’re not de facto standards.
For all our formal knowledge about static analysis and correctness in the years of academic research, it appears that rigor has not bled into the blood of common CS discussion. Somewhere, people are using FindBugs at top companies, and people are not being taught the working and practical use of it.
How do I learn it?
No idea, but algorithms and experience just seem to help. Tools can help with diagnosis. Tests, static analysis are a part of the process.
This semester, I started and led a team of 10+ people to complete a basic Scheme interpreter with C++ — it’s up on GitHub as Shaka Scheme, and is still missing macros. It’s also pretty raw, so brace yourself.
People say Lisp is so easy to implement. “So easy.” Sure, the idea that “it’s easy” is not up-front and everywhere, but it’s told in whispers around the interwebs:
“It’s just so easy to write a Scheme implementation.”
It’s just SO easy to write a Lisp interpreter!
Actually, I agree. Lisp, in its original form, was extremely simple — it was designed to manipulate what are basically linked lists and do it dynamically, and do it well.
It has a simple grammar. It’s context-free, and thus, we can use the various context-free parsing algorithms — the particular ones of note are the LL(k) recursive descent (currently used by GCC and LLVM for parsing C++) and LR parsers, which things like yacc and bison generate.
Homoiconity is useful. Since everything is a list and all code is basically in the form of S-expressions (lists), everything can be manipulated super easily.
Macros are super powerful. Lisp macros work on expressions instead of tokens like C preprocessor macros. This means macros are powerful enough to generate ANY type of code.
The implementation is easy. The above examples are a pretty clear example that Lisp can be very minimal.
So yes. Lisp is easy, and fun.
Scheme is not. And this is “The Big Lie” — Scheme is a Lisp, but Scheme is definitely NOT JUST Lisp.
Why is Scheme harder than Lisp to make work?
Scheme is a modern Lisp dialect, first created at MIT, and now on its seventh iteration with the Revised (to the 7th power) Report on the Algorithmic Language Scheme specification (R7RS).
Even though Scheme has a reputation for being minimalist, it’s implementation is Lisp, but on steroids:
Macros are not so simple. R7RS Scheme has template macros, which are declarative rather than procedural — in other words, it has its own pattern language that it matches expressions to (similar to Haskell’s pattern matching) and then has its own template expression language as well with certain special semantics.
First-class continuations are time-machines. They require you to keep an explicit control stack and have a way to save and restore the state of the control stack. This means the evaluation system must have a way of expressing it as an explicit, reified item. No implicit recursion variables — it must ALL be saved.
Circular lists are valid in Scheme. We have datum recursive list notation to thank for that. It’s not covered as a section in R7RS, and, like any details on garbage collection or memory management algorithms, the document is almost silent on the matter itself.
Meta-circular evaluators should go away if you’re doing Scheme — essentially, you’ll have to start playing with an intermediate form like continuation passing style (CPS) in order to solve the continuation problem or have reified continuations through exposing it directly in your evaluation system, and you’ll have to start playing with statically-analyzed scopes to deal with macros.
At this point, your evaluator will probably start looking more like a compiler. Hence, this is why implementations like Chez and Guile compile to native or bytecode already.
An iterative approach is necessary because the more straightforward recursive approach of the meta-circular interpreter presented in Chapter 2 cannot properly support continuations or tail calls. … Tail calls cannot be properly supported unless the implementation (at the meta level) supports them properly.
So the “circular” part is sort of problematic — the information required to support continuations should be explicit and not implicit as with recursion in order to make sure that it can be saved and restored as data.
When people say “Lisp,” what comes to mind? Is it Clojure? Scheme? Common Lisp? I think a lot more about what type of Lisp that people want to talk about because they look the same, but all come out differently in terms of semantics, and even syntax sometimes.
I was fooled by the misnomer that “Lisp interpreters are easy” meant that it still applied to modern Lisps (particularly, Scheme, since its reputation as “the simple modern Lisp” is kind of there). I’m pretty sure that when people say “Lisp,” it’s closer to the original McCarthy Lisp instead of modern Scheme or modern Common Lisp.
A: Matthew Flatt is the most active contributor to Racket as measured by GitHub’s commit graph, circa 2017. He is also a member of the PLT Racket or Racket contributor group, and a faculty member at the University of Utah.
Q: Okay, so what?
A: Racket’s macro system recently got a rehaul to implement a new syntax system so that it keeps track of scopes. This is effectively so that they can implement hygienic macros more easily (e.g. syntax-case). Essentially, every single identifier can be associated with a scope given that the primitive expressions are indeed the primitive expressions.
Over the past few months, I’ve started the shaka-scheme project for my senior project at the University of Hawaii at Manoa, and I wanted to highlight two pieces of code that recently changed our entire course of development:
(define proc1 (lambda ()
(define define 1)
(display "define is actually ")
; This doesn't display a procedure literal...
(display " in this scopen")))
(define proc2 (lambda ()
[(define a b) (display "fooled youn")]))
; This doesn't do the normal define...
; And it's not an error.
(define 1 2)))
In Scheme, you can redefine what are called primitive expression symbols inside of non-global scopes, such as within a lambda expression. If you run this using GNU Guile or Racket, you’ll find that this code runs flawlessly — it prints out:
On the Shaka Scheme project, we decided to go with a tree-based approach — essentially, we would build an AST in-memory during parsing, and then evaluate the AST through tree traversal.
This seemed like a relatively simple way of doing things. We would adopt a supposedly more efficient primitive (the IDataNode tree with children stored in a std::vector), and we would also get a nice AST if we ever wanted to debug in the future.
Unfortunately, the design called for strict primitive expression forms that assumed that (define a b) was the only valid form — because why wouldn’t it be? Of course, that all changed when you realize that (define a b c) could be valid in a certain context.
In that instant, the “fixed” structure of the various AST specifications we were using to represent the parsed forms of the various Scheme expressions basically broke. In addition, the fixed “parsing” grammar defined for the top-level <expression> rule can no longer look ahead for (define and correctly assume that the rest of the <define> rule will look like: ( define <identifier> <expression> ) as define itself could possibly be a completely different procedure… or even just 1.
The solution is to simply redo the entire evaluation and parsing specification to make NO ASSUMPTIONS.
Scheme, like most other Lisps, are notorious for being runtime-heavy in the sense that figuring out the types of things and the bindings of identifiers is harder to do statically without going down the rabbit hole of more rigorous analysis.This rings especially true here, as almost any primitive expression keyword can be a complete different procedure given the right context and redefinitions. Therefore, all parsing must be agnostic to the idea that define means only the usual define procedure, and every single expression is simply a list waiting to be evaluated (unless, of course, you can prove that it does represent the usual define expression in that context).
Second of all, Scheme lists may not be more elegant when stored as in-memory trees where the list is represented as a root with its children as the elements. The reason being is that car and cdr may require taking slides of an array or vector of children, and keeping track of the size of slices of a vector is somewhat more cumbersome than simply using pointer access to check of the existence of the next element. Why keep track of slices or trees if one can simply do the usual, canonical representation of Scheme lists as linked lists?
Third of all, Scheme R7RS macros become significantly more complicated without the uniform structure of “everything is a list,” as traversal over non-uniform trees makes parsing on macro-derived expression types becomes harder to reason about.
For example, parsing (let ((x 1) (y 2)) (+ x y)) (which is a classic macro-implemented or derived expression type, according to R7RS) requires knowledge that
let is a macro
((x 1) (y 2)) is not a procedure call, but a list of syntax primitives that will be manipulated by the macro
How is one to decide these things before actually evaluating the value of the let identifier in the current environment? There is no answer without more rigorous static analysis of the scope in question.
I personally chose Scheme as “the easy” language to implement as a semester-long “programming language-focused” project in C++. Unfortunately, the extremely dynamic nature of Scheme (let’s not get started on continuations…) means that implementing its finer parts is definitely not a trivial task.
Lisp is an especially interesting case for evaluation, because so little things are known at compile-time. The extremely flexible semantics of Scheme gives its implementation a sort of hardness that surpasses that of something like a MIPS processor simulator.
If you’re still not convinced of the non-toyness of Scheme, why not read some of the latest Scheme standard, R7RS?Scheme is a relatively full-featured language compared to the Lisps of old, and sports the following features:
Continuations with call/cc
Expression-based macros with the define-syntax/syntax-rules system (standing apart from the R6RS syntax-case macro system and the classic defmacro system)
Note that C has token-based macros with its preprocessor. Expression-based macros are more powerful.
A full-featured numeric system, with support for true fractional/rational types, inexact/exact number differentiations, and complex numbers.
Optional lazy evaluation
Records (also known as product types and comparable to C’s struct)
Strings, with some optional Unicode support
Bytevectors and vectors, which represent linear data structures with fixed-size