Starting to mock Shock in Python

I’ve been working off-and-on with Shock, my “programming language IDE/shell” in Rust, but I feel too slow when working in Rust. The productivity and working pace is just very slow.

But when I work in Python, I feel a lot more productive. For example, I was able to pick this back up and submit changes to revamp the editing experience in the Python mock-up here.

I think I’ll stick with Python for future prototyping work in Python for Shock. It also helps that I’ve been practicing Python as my primary language for coding interviews 🙂

Until the design of Shock is more fleshed out and better tested, I’ll continue to chug along.

Ranking Important Source Files in Godot Engine (3.0.2-stable)

Godot Engine (godotengine/godot on GitHub) is an open-source, primarily C++-based game engine. It supports both 2D and 3D. As a C++ programmer, I decided to generate a dependency graph of the #include “…” and try to determine the most important files in the graph.

By doing so, I hope to uncover what files are important to look at for beginners who want to contribute to Godot.

The final graph, as visualized in Gephi.


I used cinclude2dot2, a Python-based script, to parse the C/C++ source files directly from the Godot source files from its GitHub repository and detect lines with file inclusions. The data was manually cleaned and stripped of its directory paths, leaving only the filenames. This means that files that have duplicate names are merged into one single node in the graph. This is undesirable.

I used Gephi to perform the visualizations and all of the metric calculations, running OpenOrd and ForceAtlas2 on the graph with various settings.



First of all, let’s introduce a few concepts.

  • In-Degree is how many times a file is included by other files.
  • Out-Degree is how many times a file includes other files.
  • Degree is the sum of both In-Degree and Out-Degree
  • PageRank is the famous Googley algorithm for ranking web pages, but it can also be applied to graphs, in general! Here, we use it to rank files in the dependency graph.


The top 20 Godot C++ source files (with directory names stripped) ranked by In-Degree, or by how many files include these files.



First of all, according to our ranking, we can see that the configuration and access setting files config.h and project_settings.h are quite popular. Unfortunately, config.h is an amalgam of 5 different files at this time, but all reside within the thirdparty/ directory:


So, we can conclude that Godot (or, rather, code that Godot uses) is focusing on these libraries quite a lot. Nothing too big, but possibly something we may want to break up later if we want a closer look at it.

os.h is quite an important file, as it is a class that deals with changing video modes, querying the operating system for memory usage, and so on and so forth.

Strangely enough, important classes like Object are not showing up here! In-Degree is quite a naive measurement for important of nodes in a graph — you’ll see better ones later.


The top 20 files in Godot according to Out-Degree, or how many times it references other files.

We can interpret out-degree as “the most complex files to understand” if we assume that we must understand the content coming from headers that it references. It is fitting, then, that the Godot editor, and scene manager show up in this list. However, the large numbers become less scary if we consider that register_types.cpp is actually the name of over 30 distinct files…

You may have noticed that certain files with the “bt” prefix keep coming. What is this? This is the Bullet physics engine that has been used in many open-source projects, as well as in commercial projects (even movies!) It makes sense that Bullet files show up quite highly here, as physics and collision are no simple tasks to accomplish and understand.



PageRank is an algorithm that derives a node’s weights based off of the weights of other nodes, favoring connections that direct to the node rather than away from the node. Thus, PageRank not only creates a more “global” metric, but it also favors nodes on the receiving side of a dependency reference.

We can see quite a large change from the other metrics. We can see more important files like Godot’s reimplementation of vector, “basic functions and definitions to be used everywhere” in typedef.h, and Godot’s String class.

So, perhaps, when one is going through Godot, it would be helpful to keep an eye out for some of these! Odds are you’ll encounter a header file on this list, and odds are that most other people will too.

Betweenness Centrality

The top 20 files in Godot ranked by betweenness centrality or by the number of shortest paths between nodes in the graph that pass through a vertex.

Centrality metrics are great for figuring out what nodes are at “the core/central point” of the graph. Here, we can finally see that Object, “the base class for almost everything,” takes a fitting place at the top of the ranking — even the official documentation thinks its important! Variant, “the most important class in the engine,” also shows up at #2. One result that isn’t well-documented with a custom guide for it within the documentation is the GUI-focused class, Controlcontrol.h is influenced by and influences a large amount of the codebase, which is fitting if we consider that Godot Engine’s editor is built within its own GUI framework.

  • Sidenote: I personally think that Godot’s feat of building its own editor as a Godot game is quite a cool and unique thing! I think this deserves more exposure.

editor_node.h and node.h seem to form the backbone of the editor and the scene engine, respectively — they are, perhaps, prime candidates for study for getting an intimate look at Godot.

Many of the classes here hint at fundamental systems such as imaging, the main engine loop, or the editors themselves.

Important Clusters/Modules

In Gephi, I ran the modularity detection to attempt to cluster the Godot codebase into different clusters, except, this time, I decided to weight the node size based on betweenness centrality because it tended to agree with the official documentation’s placement of importance on Variant and Object. Below is the new picture, with only the 7 largest clusters colorized:

The new graph layout with betweenness centrality ranking the size of the nodes.

Cluster 1: Core and Editor (26%)

The largest unsupervised cluster detected by Gephi. It accounts for 26% of the files in Godot, and seems to encompass Object, Node, Control, and various other nodes in the core/ directory.

Cluster 2: main, config, and libraries (14.53%)

The second largest cluster seems to encompass main.h and various other libraries that are external to Godot.

Cluster 3: Object, Variant (14.11%)

Strangely enough, this cluster may belong with the first one, but it could very well easily form its own with the documentation-proclaimed “two very important classes” Variant and Object present here.

Cluster 4: Bullet library (9.2%)

Although the labels are hard to read even in the PDF, this cluster represents the Bullet physics library that Godot uses. Many of their files are prefixed with “bt”.

Cluster 5: libvpx (6.75%)

This cluster seems to include the influential vpx_config.h, a sign that this cluster most likely represents the sub-part of Godot that relies on libvpx. This is a video encoding library.


One interesting thing we found out about Godot is that a significant amount of its codebase is not even particularly a part of Godot itself. There is at least 10-15% of Godot engine code within the repository that one can expect to be fairly isolated but still heavily referenced by other libraries — perhaps worker who begin to encounter these high-influence files may find themselves having to learn these libraries to push their development to the next level.

We have revealed that a majority of the code in Godot is rightfully influenced by and influences many of the files under the core/ directory, and have identified some important candidates that may be suitable for further documentation. This analysis confirms the notion that Variant and Object are two extremely important classes in the Godot engine, though there are many other important ones which influence their areas of the codebase quite significantly as well.


Many, but not all of the pictures and data is included in the zip file below:


Is It Just Me, Or Is Academic Research Fragmented?

I’ve been trying to read papers for awhile in a few places in computer science, and I’ve come to the conclusion that trying to understand the “big picture” in research is sort of a mess.

I don’t have any evidence, but it’s just a feeling.

In my network science class, I came across a piece of evidence. Barabasi, a well-known researcher in network science, in his personal introduction to his textbook Network Science, explicitly says that “I still find puzzling how disjoint were the communities that were thinking about networks before 1999.”

I keep asking my friends: “Isn’t there some lovely visualization or navigation tool that can help me understand research in general?” Besides some of the other research tools, I can’t find a great answer with great consensus. So it appears that the practice of research aggregation lacks centralization of methodology — there’s no norm for how to do it…


Do we just need to figure out a norm or create a tool to help us do this? My shades are tinted right now, but I bet graph theory could be a great start for a foundation of what papers are important… Hm…


Things They Won’t Teach You In School

Disclaimer: at least at my school. And in years of lurking the Internet.

#1 – Libraries (and making them too)

Most people will NOT teach you libraries. Anything from Qt in C++ to Django in Python is going to be less likely than assignments to roll your own. Of course, some fields, like graphics and compilers will use more standard tools and libraries (OpenGL, ANTLR/yacc/bison).

Now, making your own library? I’ve rarely seen that being emphasized in discussion on the Internet but you will surely write code that will be used by somebody else. In a way, many people make libraries for each other everyday — it’s just not the primary idea.

How do I learn it?

Create your own. Look at others. Documentation is about 50% of using a library, so writing it should be 50% of making a library. Look into your language documentation libraries, like Doxygen for C++ and Javadoc for Java.

If you think about it, operating systems are like the O.G. libraries. They’re just such a part of the system that they have their own club now.

#2 – Build Systems

I find it strange how people do not cover build systems as a first-class…. well… class (double pun). It should deserve it’s own semester class, in my opinion. How many times do you use a build system and you are required to understand your dependencies and how they fit together? In my short experience in working, build systems are everything, from the compilation and customization process all the way up to deployment and support.

How do I learn it?

C++ has no good universal build system. The closest one is CMake or Visual Studio. Solutions like Meson are up-and-coming, but often have larger dependencies like Python.

Java is probably the best example of having build systems with Maven and Gradle. It’s easier to integrate with libraries since repositories are well-established and JVM is a cross-platform blessing.

Start with a continuous integration tool and actually use it — TravisCI, Circle CI, and Jenkins are popular for open-source.


#3 – Rigorous Testing and Validation

Testing and validation may be covered, but people will not usually go into the differences between dumb mocks and spies and why you should care. It appears that testing is often a big part of the deployment process for larger companies. Additionally, static analysis may be used at more careful organizations in aerospace and hardware work. The balance between the perfect ideals of correctness and soundness and “just work, damn you!” is something that you’re not often to get formally trained in.

How do I learn it?

Learn the difference between unit and integration tests, and if you’re headed for an OOP place, maybe they’ll be using something like Google Mock/Google Test or EasyMock/Junit.

Explore standard debugging tools and some basic static analysis tools. For starters, try Cobertura and FindBugs in Java. If you’re really interested, learn a little more about your language’s type system and the benefits and downsides it has.

#4 – Critical Diagnosis and Debugging

Nobody teaches you to how to diagnose a bug definitively because we’re not there yet. The tools vary too much across industries — heck, even debugging formats are not standardized across platforms, and if they are, they’re not de facto standards.

For all our formal knowledge about static analysis and correctness in the years of academic research, it appears that rigor has not bled into the blood of common CS discussion. Somewhere, people are using FindBugs at top companies, and people are not being taught the working and practical use of it.

How do I learn it?

No idea, but algorithms and experience just seem to help. Tools can help with diagnosis. Tests, static analysis are a part of the process.





Doing Scheme – The Big Lie

This semester, I started and led a team of 10+ people to complete a basic Scheme interpreter with C++ — it’s up on GitHub as Shaka Scheme, and is still missing macros. It’s also pretty raw, so brace yourself.

People say Lisp is so easy to implement. So easy.” Sure, the idea that “it’s easy” is not up-front and everywhere, but it’s told in whispers around the interwebs:

It’s just SO easy to write a Lisp interpreter!

Actually, I agree. Lisp, in its original form, was extremely simple — it was designed to manipulate what are basically linked lists and do it dynamically, and do it well.

It’s elegant:

  • It has a simple grammar. It’s context-free, and thus, we can use the various context-free parsing algorithms — the particular ones of note are the LL(k) recursive descent (currently used by GCC and LLVM for parsing C++) and LR parsers, which things like yacc and bison generate.
  • Homoiconity is useful. Since everything is a list and all code is basically in the form of S-expressions (lists), everything can be manipulated super easily.
  • Macros are super powerful. Lisp macros work on expressions instead of tokens like C preprocessor macros. This means macros are powerful enough to generate ANY type of code.
  • The implementation is easy. The above examples are a pretty clear example that Lisp can be very minimal.

So yes. Lisp is easy, and fun.

Scheme is not. And this is “The Big Lie” — Scheme is a Lisp, but Scheme is definitely NOT JUST Lisp.

Why is Scheme harder than Lisp to make work?

Scheme is a modern Lisp dialect, first created at MIT, and now on its seventh iteration with the Revised (to the 7th power) Report on the Algorithmic Language Scheme specification (R7RS).

Even though Scheme has a reputation for being minimalist, it’s implementation is Lisp, but on steroids:

  • Macros are not so simple. R7RS Scheme has template macros, which are declarative rather than procedural — in other words, it has its own pattern language that it matches expressions to (similar to Haskell’s pattern matching) and then has its own template expression language as well with certain special semantics.
  • First-class continuations are time-machines. They require you to keep an explicit control stack and have a way to save and restore the state of the control stack. This means the evaluation system must have a way of expressing it as an explicit, reified item. No implicit recursion variables — it must ALL be saved.
  • Circular lists are valid in Scheme. We have datum recursive list notation to thank for that. It’s not covered as a section in R7RS, and, like any details on garbage collection or memory management algorithms, the document is almost silent on the matter itself.

Meta-circular evaluators should go away if you’re doing Scheme — essentially, you’ll have to start playing with an intermediate form like continuation passing style (CPS) in order to solve the continuation problem or have reified continuations through exposing it directly in your evaluation system, and you’ll have to start playing with statically-analyzed scopes to deal with macros.

At this point, your evaluator will probably start looking more like a compiler. Hence, this is why implementations like Chez and Guile compile to native or bytecode already.

Are you sure?

There’s been ways of getting around the two, but according to R. Kent Dybvig (who is a prolific writer on Scheme and has been heavily involved with Chez Scheme), he thinks that meta-circular evaluators have to go:

An iterative approach is necessary because the more straightforward recursive approach of the meta-circular interpreter presented in Chapter 2 cannot properly support continuations or tail calls. … Tail calls cannot be properly supported unless the implementation (at the meta level) supports them properly.

So the “circular” part is sort of problematic — the information required to support continuations should be explicit and not implicit as with recursion in order to make sure that it can be saved and restored as data.

So what?

When people say “Lisp,” what comes to mind? Is it Clojure? Scheme? Common Lisp? I think a lot more about what type of Lisp that people want to talk about because they look the same, but all come out differently in terms of semantics, and even syntax sometimes.

I was fooled by the misnomer that “Lisp interpreters are easy” meant that it still applied to modern Lisps (particularly, Scheme, since its reputation as “the simple modern Lisp” is kind of there). I’m pretty sure that when people say “Lisp,” it’s closer to the original McCarthy Lisp instead of modern Scheme or modern Common Lisp.

Food for thought.

But that Ruby one supports continuations…

Ruby already has it with its own callcc.

Q: Why is Racket’s parsing system so complicated (Scheme)?

A: Circa 2016, Racket’s internal syntax representation system had to be revised because of macros, or “code rewriting” or “code-to-code procedures.” Matthew Flatt can explain in his paper, “Binding as Sets of Scopes” in POPL 2016, and also on his online notes on the new macro expander system for Racket (also formerly PLT Scheme).

Q: But who is Matthew Flatt?

A: Matthew Flatt is the most active contributor to Racket as measured by GitHub’s commit graph, circa 2017. He is also a member of the PLT Racket or Racket contributor group, and a faculty member at the University of Utah.

Q: Okay, so what?

A: Racket’s macro system recently got a rehaul to implement a new syntax system so that it keeps track of scopes. This is effectively so that they can implement hygienic macros more easily (e.g. syntax-case). Essentially, every single identifier can be associated with a scope given that the primitive expressions are indeed the primitive expressions.

In Scheme, even your keywords could be redefined.

Over the past few months, I’ve started the shaka-scheme project for my senior project at the University of Hawaii at Manoa, and I wanted to highlight two pieces of code that recently changed our entire course of development:

(define proc1 (lambda ()
  (define define 1)
  (display "define is actually ")
  ; This doesn't display a procedure literal...
  (display define)
  (display " in this scopen")))

(define proc2 (lambda ()
  (define-syntax define
    (syntax-rules ()
      [(define a b) (display "fooled youn")]))
  ; This doesn't do the normal define...
  ; And it's not an error.
  (define 1 2)))

In Scheme, you can redefine what are called primitive expression symbols inside of non-global scopes, such as within a lambda expression. If you run this using GNU Guile or Racket, you’ll find that this code runs flawlessly — it prints out:


So what?

On the Shaka Scheme project, we decided to go with a tree-based approach — essentially, we would build an AST in-memory during parsing, and then evaluate the AST through tree traversal.

Untitled Diagram.png
A conceptual diagram of our tree-based AST.

This seemed like a relatively simple way of doing things. We would adopt a supposedly more efficient primitive (the IDataNode tree with children stored in a std::vector), and we would also get a nice AST if we ever wanted to debug in the future.

Unfortunately, the design called for strict primitive expression forms that assumed that (define a b) was the only valid form — because why wouldn’t it be? Of course, that all changed when you realize that (define a b c) could be valid in a certain context.

In that instant, the “fixed” structure of the various AST specifications we were using to represent the parsed forms of the various Scheme expressions basically broke. In addition, the fixed “parsing” grammar defined for the top-level <expression> rule can no longer look ahead for (define and correctly assume that the rest of the <define> rule will look like: ( define <identifier> <expression> ) as define itself could possibly be a completely different procedure… or even just 1.

The solution

The solution is to simply redo the entire evaluation and parsing specification to make NO ASSUMPTIONS.

Scheme, like most other Lisps, are notorious for being runtime-heavy in the sense that figuring out the types of things and the bindings of identifiers is harder to do statically without going down the rabbit hole of more rigorous analysis.This rings especially true here, as almost any primitive expression keyword can be a complete different procedure given the right context and redefinitions. Therefore, all parsing must be agnostic to the idea that define means only the usual define procedure, and every single expression is simply a list waiting to be evaluated (unless, of course, you can prove that it does represent the usual define expression in that context).

Second of all, Scheme lists may not be more elegant when stored as in-memory trees where the list is represented as a root with its children as the elements. The reason being is that car  and cdr may require taking slides of an array or vector of children, and keeping track of the size of slices of a vector is somewhat more cumbersome than simply using pointer access to check of the existence of the next element. Why keep track of slices or trees if one can simply do the usual, canonical representation of Scheme lists as linked lists?

Third of all, Scheme R7RS macros become significantly more complicated without the uniform structure of “everything is a list,” as traversal over non-uniform trees makes parsing on macro-derived expression types becomes harder to reason about.

For example, parsing (let ((x 1) (y 2)) (+ x y)) (which is a classic macro-implemented or derived expression type, according to R7RS) requires knowledge that

  • let is a macro
  • ((x 1) (y 2)) is not a procedure call, but a list of syntax primitives that will be manipulated by the macro

How is one to decide these things before actually evaluating the value of the let identifier in the current environment? There is no answer without more rigorous static analysis of the scope in question.


I personally chose Scheme as “the easy” language to implement as a semester-long “programming language-focused” project in C++. Unfortunately, the extremely dynamic nature of Scheme (let’s not get started on continuations…) means that implementing its finer parts is definitely not a trivial task.

Lisp is an especially interesting case for evaluation, because so little things are known at compile-time. The extremely flexible semantics of Scheme gives its implementation a sort of hardness that surpasses that of something like a MIPS processor simulator.

If you’re still not convinced of the non-toyness of Scheme, why not read some of the latest Scheme standard, R7RS?Scheme is a relatively full-featured language compared to the Lisps of old, and sports the following features:

  • Continuations with call/cc
  • Expression-based macros with the define-syntax/syntax-rules system (standing apart from the R6RS syntax-case macro system and the classic defmacro system)
    • Note that C has token-based macros with its preprocessor. Expression-based macros are more powerful.
  • A full-featured numeric system, with support for true fractional/rational types, inexact/exact number differentiations, and complex numbers.
  • Optional lazy evaluation
  • Records (also known as product types and comparable to C’s struct)
  • Strings, with some optional Unicode support
  • Bytevectors and vectors, which represent linear data structures with fixed-size
  • Exceptions
  • Library/Module system


C++: Template duck-typing in 30 lines


struct Duck {
    void quack(std::ostream& out) {
        out << "Quack!" << std::endl;

struct Platypus {
    void quack(std::ostream& out) {
        int* i = NULL;
        out << *i << std::endl;

void do_quack(std::ostream& out,
        MaybeDuck duck) {


int main() {
    Duck duck;
    Platypus platypus;

    do_quack(std::cout, duck);
    do_quack(std::cout, platypus);

C++ has compile-time duck typing

(See it live on Coliru)

The C++ compiler doesn’t actually do much semantic checks in terms of whether it can “figure out” whether the type you give a template is correct.

All it does is kind of “copy-and-paste” the type you give it into the placeholder type, and sees if it compiles correctly.

Here, our two classes will both quack(), but will not necessarily do the same
thing. One will print; the other will most likely give a segmentation fault, because you’re trying to dereference a  NULL pointer.

In other words, C++ does not do special semantic checking on templates unless you do it yourself.

Hence, this program crashes.

So what’s the solution?

There are many.

Is intuition in programming important to you?

Note: Read only the bold statements if you want to skim this quickly.

Do you remember the first time you saw classes? Objects? The word “string?” What about binary trees, regular languages, compilation of languages, and recursion?

I don’t realize it myself a lot of times, but I’ve come a long way in changing my mindset to handle the terminology of programming ideas and make them seem natural. I think others are the same.

Do you remember the day when you didn’t understand anything of programming terminology?

Today, hundreds or thousands of people are in that same situation. Today, many people do not know what a “segmentation fault” is or why their program is crashing. They don’t know why a reference for WinMain@16 is missing from their MinGW program that makes use of wxWidgets when first tangling with Code::Blocks, and they certainly won’t know how monads relate to endofunctors.

Do you remember how you first grasped your first to most recent ideas of programming?

I remember my personal experience: it was late at night trying to figure out why my 5th project in C++ was not compiling and I began helplessly throwing questions at Google and Stack Overfow.

I had no idea what I was doing, and — who’s to blame? — I just didn’t know what I was doing.

I remember that my account was question-banned on Stack Overflow for trying to ask questions as an independent learner, one or two semesters before I would ever encounter my first C++ program in college.

Of course, Stack Overflow has a strict policy on repeating questions that have already been answered, and an unwritten, unspoken policy on downvoting really basic questions — which I quickly learned after the fact.

But after three months of hitting roadblocks in compilation and unable to pass barriers of discipline and knowledge required to go directly to documentation, I found myself on top of a basic GUI system implemented on-top of SDL with Lua scripting on the side.

Learning can be a frustrating and humiliating experience; it can also be thrilling and easy given the right mindset and the right resources. To me, learning is not about wading through descriptions of mind-constructs and precise, compiler-like terminology to simply parse the mechanisms by which we build, arduously and untiringly — we make and mutate what is.

It’s not about the methods by which we gather information — we do not learn by reading documentation, but, rather, by the methods by which we internalize, understand, and distill such knowledge.

Learning is compressing methods and ideas intuitively so that they can be reconstructed and utilized at will.

Who is aiming to precisely teach “how to compress methods and ideas intuitively so that they can be reconstructed and utilized at will” for basic programming concepts? For searching and understanding basic programming terminology?

Opponents of doing such might give the arguments that “we don’t baby people around” and “people are capable of Googling the most basic of concepts,” but I don’t blame a baby for not being able to walk when all he or she can do is crawl.

If you remember the time that you searched “What is the best programming language?” on the Internet, you may remember that the best answer that  was given was no answer at all. People even think that asking such as question is wrong (and “wrong questions” do exist, in a sense).

I tried to write a tutorial myself. I worked on a C++ tutorial blog a few months back. And it’s horrible. It’s a mish-mash of concepts that are out-of-scope, poorly explained and written, and are not focusing on compression of knowledge into intuition. There is no notion of audience or aim; just a spewing of bits of knowledge that fail to present a coherent whole.

I now think that it’s not about just simplicity or a short word count, but teaching concepts so that they become intuitive to the receiver.

Many people want to learn programming today. Today, many people are starting at ground zero, with no equipment to learn for themselves. And today, you are the more experienced.

I’m writing this because I want to ask three questions and see what other people think:

  1. Is intuition in programming important for you? How would you teach correct intuition to a beginner?
  2. What concepts were important for you for sustained learning in programming?
  3. What resources do people need to map out programming for themselves and say to themselves: “This is what I know,” “This is what I don’t know,” and, “This is how I’m going to learn this?”

What is a type, and how do you think about it?

I was recently asked by a person who was going to take his first C programming class next semester to show him what programming was like. So I went over the compilation process, showed him some C++ code, and tried to talk about interface vs. implementation and what an API is.

But one of the hardest questions I came across was,

“What is a type?”

Now, as a student programmer, you kind of have to think about it. People do not get a whole mumble-jumble about the core idea behind types — we simply get something like,

“Oh, there just are different types, you know. You know the difference between an integer and a decimal number, right? Well, in C, they’re called int and float!”

What’s a person to think?

At this point, somebody who has never encountered the idea of types has to think: “What is a type?” Just imagine yourself asking these questions:

Is a type like different numbers? Is 2.0 a different type than 1 than 1.0? If they’re both decimals, are they the same type? But if they’re the same value, aren’t they exactly the same type of number? Does having the same value mean having the same type?

And later:

Is struct pair {int i; int j;} the same as int i; int j;? I mean, they both contain the same data right? One is a pair of numbers, and the other is a pair of numbers. But why do I have this compiler error? My function expects a pair of two numbers, but I gave it two numbers! Why?!

And even more later:

What is a monad? Is it a type, or a function on types? But isn’t a function itself a type? And a construction of a type with a monadic constructor will “trigger” certain functions? And what do you mean a monad doesn’t require a join operation?!

Somewhere along the line, naive intuition fails. And, having nowhere to go, what’s a person to do?

You’re not going to get a formal talk on type theory until you decide you want one. But if you want to build a correct intuition of what a type is, maybe it’s time for some research.

The Philosophy of Types

I’m not going to get a touchy-feel-y and truly intuitive intuition of anything by reading a computer manual or a hard technical paper that’s already expecting you to already have a strong sense of what a type is. For this, I chose to go to the Stanford Encyclopedia of Philosophy.

Why? Because the idea of what a type is has deeper roots than type theory’s birth by Bertrand Russell alongside Russell’s Paradox for naive set theory. I’m sure that people have been categorizing things (and people) into types long before set theory was formalized (e.g. humans into male and female). And philosophy will provide intuition on more than just a pure mathematical level.

From Section 3: Types and Universals:

Are types universals? They have usually been so conceived, and with good reason. (See the entry on properties.) But the matter is controversial. It depends in part on what a universal is.

“But what’s a universal?” I asked:

Universals, in contrast to particulars, have been characterized ashaving instances, being repeatable, being abstract, being acausal, lacking a spatio-temporal location and being predicable of things. Whether universals have all these characteristics cannot be resolved here. The point is that types seem to have some, but not all, of these characteristics.

Where predicable is defined as “capable of being asserted,” where predicate is defined as “something that is affirmed or denied of the subject in a proposition in logic” by the Merriam-Webster dictionary.

This seems to be closer to what a property is:

Properties (also called ‘attributes,’ ‘qualities,’ ‘features,’ ‘characteristics,’ ‘types’) are those entities that can be predicated of things or, in other words, attributed to them. Moreover, properties are entities that things are said to bear, possess or exemplify.

In other words, if there’s a sentence like, “The sky is blue,” here, “The sky” is the object, and “is blue” is the property.

Continuing on, skipping ahead a little bit:

When it comes to being predicable, however, most types diverge from such classic examples of universals as the property of being white or the relation of being east of. They seem not to be predicable, or at least not as obviously so as the classic examples of universals. That is, if the hallmark of a universal is to answer to a predicate or open sentence such as being white answers to ‘is white’, then most types do not resemble universals, as they more readily answer to singular terms. … It is also underscored by the observation that it is more natural to say of a token of a word—‘infinity’, say—that it is a token of the word ‘infinity’ than that it is an ‘infinity’. That is to say, types seem to be objects, like numbers and sets, rather than properties or relations; it’s just that they are not concrete particulars but are general objects—abstract objects, on some views. If, then, we follow Gottlob Frege (1977) in classifying all objects as being the sort of thing referred to by singular terms, and all properties as the sort of thing referred to by predicates, then types would be objects. Hence they would not fall into the same category as the classic examples of universals such as being whiteand being east of, and thus perhaps should not be considered universals at all. (Although perhaps all this shows is that they are not akin to properties, but are their own kind of universal.) A general exception to what has just been claimed about how we refer to types (with singular terms) might be inferred from the fact that we do more often say of an animal that it is a tiger, rather than that it is a member of the species Felis Tigris. This raises the question as to whether the species Felis Tigris is just the property of being a tiger, and if it isn’t, then what the relation between these two items is.

This last sentence raises the question: is being a member of the set of members defined to be within a type the same as being a type?

Well, we could say yes. But object “tiger” is not the same as the abstract notion of “the tiger” as a type.

We could say no, but the tiger object is certainly an instance of a type of tiger.


But what is a type?

From Section 4: What is a Type? (with selected parts put in bold) —

The question permits answers at varying levels of generality. At its most general, it is a request for a theory of types, the way set theory answers the question “what is a set?” A general answer would tell us what sort of thing a type—any type—is. For example, is it sui generis, or a universal, or perhaps the set of its tokens, or a function from worlds to sets, or a kind, or, as Peirce maintained, a law? These options are discussed in §4.1. At a more specific level, “what is a type?” is a request for a theory that would shed some light on the identity conditions for specific types of types, not necessarily all of them. It would yield an account of what a word (or a symphony, a species, a disease, etc.) is. This is in many ways a more difficult thing to do.

In other words, the answer is almost a non-answer. “What is a type?” I asked. “Well, it depends on what your definition of what a type is…” replied the article.

Reading on further, it asks a question:

Is it a set?

It might appear that types are better construed as sets (assuming sets themselves are not universals). The natural thought is that a type is the set of its tokens, which is how Quine sometimes (1987, p. 218) construes a type. After all, a species is often said to be “the class of its members”. There are two serious problems with this construal. One is that many types have no tokens and yet they are different types. For example, there are a lot of very long sentences that have no tokens. So if a type were just the set of its tokens, these distinct sentences would be wrongly classified as identical, because each would be identical to the null set. Another closely related problem also stems from the fact that sets, or classes, are defined extensionally, in terms of their members. The set of natural numbers without the number 17 is a distinct set from the set of natural numbers. One way to put this is that classes have their members essentially. Not so the species homo sapiens, the word ‘the’, nor Beethoven’s Symphony No. 9. The set of specimens of homo sapiens without George W. Bush is a different set from the set of specimens of homo sapiens with him, but the species would be the same even if George W. Bush did not exist. That is, it is false that had George W. Bush never existed, the species homo sapiens would not have existed. The same species might have had different members; it does not depend for its existence on the existence of all its members as sets do.

So, it’s not just a set. A type that is missing one of its numbers does not disappear because one of its members does.

But what if we want that type with a missing number to be a distinct type? I think that having an int but not 0 type can be quite helpful and is quite distinct from int as a type.

I think the next paragraph is quite good:

Better, then, but still in keeping with an extensional set-theoretic approach, would be to identify a type as a function from worlds to sets of objects in that world. It is difficult to see any motivation for this move that would not also motivate identifying properties as such functions and then we are left with the question of whether types are universals, discussed in §3.

If a world can be called a set, then a type is a function to map a set onto subsets of itself. I think that’s a fair definition: classification of objects depends on there being objects to classify and make divisions in classification.


Is it a kind?

From the section on it:

The example of homo sapiens suggests that perhaps a type is a kind, where a kind is not a set (for the reasons mentioned two paragraphs above). Of course, this raises the question of what a kind is; Wolterstorff (1970) adopts the kind view of types and identifies kinds as universals.

So a kind is not a set but is a type.

Instead he views the type as what he calls the archetype of the kind, defined as something that models all the tokens of a kind with respect to projectible questions but not something that admits of answers to individuating questions. Thus for Bromberger the type is not the kind itself, but models all the tokens of the kind. We shall see some difficulties for this view in §5 below.

That’s fair… the concept of a type as an archetype for a set of objects that subscribe to that archetype is reasonable.

Is it a law?

Thus types have a definite identity as signs, are general laws established by men, but they do not exist. Perhaps all Peirce meant by saying they do not exist was that they are “not individual things” (6.334), that they are, instead what he calls “generals”—not to be confused with universals. What he might have meant by a “general law” is uncertain. Stebbing (1935, p.15) suggests “a rule in accordance with which tokens … could be so used as to have meaning attached to them”. Greenlee (1973, p. 137) suggests that for Peirce a type is “a habit controlling a specific way of responding interpretatively.” Perhaps, then, types are of a psychological nature. Obviously two people can have the same habit, so habits also come in types and tokens. Presumably, types are then habit types. This account may be plausible for words, but it is not plausible for sentences, because there are sentences that have no tokens because if ÎŚ,Ψ are sentences, then so is (ÎŚ & Ψ) and it is clear that for Peirce “every [type] requires [tokens]” (2.246). And it is much less plausible for non-linguistic types, like types of beetles, some of which have yet to be discovered.

I do like the idea of having a type be the way of interpreting tokens or instances of objects. But then the question becomes, “Does an object intrinsically possess a type?” Here, it appears that the answer is, “That depends on how people categorize it.” Once again, I think this can be seen as a relationship between a universal set of objects into subsets that satisfy the type.

But so what?

I think Section 5 does a good job at creating a “So What?” moment for us:

The relation between types and their tokens is obviously dependent on what a type is.

I agree. There exist multiple possible ways to look at the abstract idea of a type.

If it is a set, as Quine (1987, p. 217) maintains, the relation is class membership.

Then we can defer to set theory (or one of them, of which a popular choice would be ZFC), a basis for modern mathematics.

If it is a kind, as Wolterstorff maintains, the relation is kind-membership.

The idea of a kind is a bit foggy on first read, but a kind looks to be something that models all the objects within the “set” or “category” of a particular kind “with respect to projectible questions” but for which answers to “individuating questions” is not valid. That is, I can ask you about a property of a kind, but I can’t ask you to say, “Where was Homo sapiens, as a species or type, on New Year’s Eve” because Homo sapiens is a species, not a specific object.

If it is a law, as Peirce maintains, it is something else again, perhaps being used in accordance with. (Although Peirce also says a token is an “instance” of a type and that the token signifies the type.)

Seems kind of ambivalent in nature on first observance.

Nonetheless, it has often been taken to be the relation of instantiation, or exemplification; a token is an instance of a type; it exemplifies the type.

And to this, I think we can agree: things that are of a type exemplifies the type, or, things that are of a type are examples of a type.

Though, that seems pretty useless. An identity property isn’t anything special.

That wasn’t much of a “so what?” moment…

I think, from reading the article on types and tokens, we can assume the following:

  • Types are usually interpreted as universals that cannot be picked out and given individual object properties. That is, we can talk about “All int have some value between the range of a and b,”, but we can’t really say, “The type-object int itself has a value that is between a and b.” No. A type is not a “physical object,” but more like an abstract object that is a stand-in for all of the objects that is admitted to a type, but cannot be picked out itself validly and observed “physically” or “in particular.”
  • Types can exhibit set properties if you adopt such a philosophy. This means that if we match our own philosophy of types to set theory, we can have all the power and work already done on ZFC and, by possible extension, category theory. This has worked out very well — we can see that functional programming is a very valid paradigm of programming, with links to category theory.
  • The mechanism of recognizing objects as “tokens” or “instances” of a type should be defined. And, we’ve come up with different ways of handling the decision of types: simply typed lambda calculus, dependent type theory, refined types. Most programming languages provide basic types that are already defined, and operations by which to define other types based on those basic types in a recursive manner until you reach those base types again.

As a programmer and computer engineering candidate, I think the most popular application of type theory is in type systems for the various programming languages and compilers. Most of the popular languages of today have some notion of type and use it for both static and dynamic type checking, as well as compile vs. run-time type checking for safety and checking the validity of programs.

As the computer deals with the abstract notion of information, we can consider the following questions:

Final Questions

  • How does type theory mix with information theory, especially as applied in physical hardware architecture and abstract software architecture and design?
  • How is type theory applied to type systems for programming languages in the “real world,” and why is it popular? What are its direct productivity and safety gains and properties that make them almost universal among modern programming languages?
  • What are some algorithms for type system-related operations that are efficient? Are some type systems better than others with respect to time complexity of type checking algorithms, etc.

But wait, what would you say to a beginner?

  • Types represent a class of objects that all have something in common. For example, the int type is a class of integers which are all between the range of such and such.
  • In C and similar abstraction-level languages, all types are interpretations on the language level from the actual binary information that fuels most computer architectures. The same sequence of bits can represent multiple things depending on what lens of “type” is being used. But this matters less as more abstraction is applied, and the coupling between the raw binary representation of data and our type interpretation is less coupled or close.
  • How you decide whether something is of a type depends on the language, but most often, we just declare certain pieces of data to already be a type, and then, the system takes it from there, and limits the way we program based on how the system allows us to manipulate the types. And you can always define more ways to go between types.