C++ concepts and programming by design

C++ concepts are an upcoming feature that is being proposed for C++17. While it’s still under work, the general idea behind concepts is easier to talk about, and is language-agnostic.

What are C++ concepts?

From the above, Stroustraup writes:

“A concept is a compile-time predicate.”

Concepts, at least as described here, are a way of describing semi-generic sets of types that impose requirements on their members. In a sense, a concept is a category of types.

The following is an example pulled from the link above:

template<Associative_container C>
Iterator_type<C> find(C&& assoc, const Key_type<C>& key)
{
  return assoc.find(key);
}

Usually, ensuring that types match up with a certain “category” of types is hard in C++; inheritance is one solution, but forcing a type hierarchy can be awkward and add unneeded complexity to code, especially if there is more than one “category” to satisfy. Another solution would be to use something like std::enable_if<>() and other compile-time, template-based tricks. As we can see here, the idea of performing a search on an Associative_container such that the algorithm requires a Container value and a constant reference to a Key-like type and return an Iterator for the Container is quite naturally expressed once we can express categories of types in our code.

What’s the big idea?

How can we perform logic based on categories rather than types?

This semi-generic abstraction over types is very crucial, if we think about various applications. It allows us to more closely model and abstract our programs in terms of design concepts.

Often times, we consider concepts over types rather than just types:

  • Addition is a function that adds two numbers together.
    • The concept of a number can be expressed as a union of many different types: the natural numbers, the integers, real numbers, imaginary numbers, quaternions, etc.
  • The concept of a “string” or “number” or even “linear container” can be split among various implementations that are encoded as different types with a common interface. Forcing inheritance is an invasive move for maintenance.
    • Allowing for multiple implementations to subscribe to a common interface is the name of the game in many different languages; classically, Java has its interface mechanism, but whose inclusion or opt-in scheme is similar to that of inheritance. C++ concepts are opt-in by inspection, with the checking done automatically.
  • Many types can be move-constructable but not copy-constructable. The various permutations of move/non-movable, copy/non-copyable can be tedious to figure out.
    • The concepts of MoveConstructable and CopyConstructable can simplify things by turning checking for common features/common interface into a simple type check and expression evaluation at compile-time, with less errors.
  • Categories are more generic than types; substitution is easier over when we can substitute for a class of types rather than just one.
    • For example, finding a substitute for a grain for breakfast is much harder if you need to find a substitute specifically for quinoa.
      • If you have no concept of what a “grain” is, how do you know without having to declare a large inheritance tree that quinoa is a grain?
        • What if something can serve as a substitute for a grain but isn’t actually a grain? Is duck typing okay for inheritance when a duck isn’t a grain?

You’re stupid and you write so bad. What’s your point?

Constraints violations are diagnosed at the point of use, just like type errors. C++98 (and C++11) template instantiation errors are reported in terms of implementation details (at instantiation time), whereas constraints errors are expressed in terms of the programmer’s intent stated as requirements. This is a fundamental difference. The diagnostics explain which constraints were not satisfied and the specific reasons for those failures.

  • Concepts, most importantly, are considered as a way to solve the extremely dense compilation error problem with templates.
    • Any C++ programmer who has had to deal with compilation errors of libraries like Boost or even an object as “basic” as `std::vector<T>` understands that template diagnostics can be terrifying and soul-wrenching to debug, as having to look deep into the implementation or online documentation whenever something doesn’t compile is a major slowdown when trying to fix errors and continue with development.
  • People want to program with their own concepts in C++, not just with specific types or with the pre-defined concepts of C++.
    • Stroustraup mentions that an implementation of concepts was created to test concepts out, even using concepts with the implementation of the standard library. I’d bet that understanding that your object can’t be copied because you’re using std::unique_ptr<Node<T>> instead of Node<T>* would be sped up if the compiler would just tell you that std::unique_ptr<Node<T>> is not a Copyable type instead of telling you it couldn’t match the copy constructor types or the function was deleted.
  • The concept, or, rather, the semi-generic interface is a more efficient abstraction than inheritance for “category of types.”
    • Concepts are closer to the concept of a “category of types with similar functionality” than inheritance, as a concept is not a type. As per math, a set is not the same as its elements.

 

 

3 Reasons For Modularization

I’m pretty sure most of the non-beginner people in production already know about this, but I wanted to write about this because I’ve had a moving experience with modules recently.

It’s occurred to me that it’s hard to really get the importance of modularization on the first try. I’ve recently witnessed a class learning object-orientated programming in C++ coming from C, and while some people get modularization with encapsulation, others didn’t get it on their first try. I’m of the stance that modularization is probably one of the most crucial lessons for “getting” programming, and it’s also the one that naturalizes order against entropy.

I mean, just being able to perform namespacing is a form of modularization and separation if used conventionally, and just namespacing a pretty crucial feature once we start a fiesta with more than maybe 10 libraries/modules/packages/gems/etc. (see: R packages; aliasing on import and dplyr/plyr).

Really, a lot of things come from the concept of modularization and DRY working together: “Modules? Libraries? Packages? Static vs. dynamic linking? Interfaces? Functions? Objects? Who cares?” the coder might think, annoyed by seemingly large projects with pointless cruft when everything can be put into main().

But those are core concepts! But getting somebody to intimately understand the “why” besides just saying it for show might take a little bit more.

It’s really hard to explain to somebody why it’s important until you actually encounter it. It’s like something you sort of need to experience, either as an immersive mind experiment to understand the frustration, or you need to go down the wrong path to experiencing it. Maybe I’ll write about it sometime.

Either way: here’s a good visual for modularization and 3 reasons why it’s good thing:

This is your code with modularization.
This is your code exploding with NO modularization.

1. It reduces mind load.

Separation of concerns is one of the best things about modularization of code. It is also what enables procedural and functional programming. Being able to reuse parts of code in places where it can once again repeat its purpose without having to manually retrace your steps to program it (again) is great!

Additionally, some people say that the human mind can focus on about four to seven distinct things at once. If that were so, then the mind should only be able to consider a limited amount of distinct “steps” in a procedure until it can consolidate several steps into an “umbrella” step, like how thousands of lines of code can result in a single compression function or something like “allocate this piece of memory for me.”

The human brain doesn’t seem capable of simultaneously focusing on every single line of code at once. Modularization reduces the mind load on the programmer by hiding underlying details and wrapping it up into one nice “capsule” of concept or usage, leaving more “mind-space” for more concepts in thinking.

2. It makes for easier debugging.

If you have a program to bake a cake, and main() contains everything from ordering the supplies to delivery, when the entire thing crashes, where in the code do you check first?

“Well, it’s in main(),” one might say. “But, main() is 25,000 lines long…”

Some people might turn to step-wise debugging with something like GDB. Others might turn to test cases. Embedded programmers might have to deal with print-like debugging with LED switches. A few might move to static analysis tools (squee!!!).

Ultimately, your debugging is going to have to take you back to where the error is.

Modularization reduces the space (and, so, time) in which you need to look for bugs. Being able to find the part of the program in which there might be errors by knowing other modules have already been tested or proven to be bug-free under normal operating conditions is extremely valuable! Not having to look for bugs in, say, standard library functions is a huge productivity boost. You want to be squashing bugs in your program’s code, not the code that holds it up, whether you wrote it or not.

This sort of “possibility space” reduction is also crucial for certain algorithms as well, such as the binary tree search. Being able to place things into different “buckets” of concept or steps is always helpful.

3. It enables abstraction.

Being able to “block out unnecessary details” is a big part of many interfaces of all kinds. For a given interface, hiding implementation details is often seen as a security and usability boost, but, again, it also simplifies the way somebody uses it and thinks about using it.

Power, simplicity, and consistency might be a few words to describe what certain good interfaces have in common. Simplicity in coherence and consistency are what modularization provides. What better starting point for abstraction?

Modularization enables abstraction by being the mechanism by which to tie details together to create a tight abstraction. Trying to create a “facade API” that breaks if people don’t play by the rules isn’t going to work out well for people if they decide to try and break your system. Having the ability to turn your abstract concepts and rules into an encapsulated system is a very powerful tool in your toolset as a programmer.

So what does that mean when I code?

I guess we could turn it into a list like this:

  • Smaller is better. It’s also the heuristic for…
  • Simple is better. Simple things usually take less to code.
  • One procedure, one purpose. Unix philosophy was made with this in mind, right, and we know how successful that’s been.
  • Close your modules and systems. A closed system is such that its guts won’t fall out if you shake it up. Code that doesn’t break inside its own assumptions is crucial for stability. Not doing this in, say, C, will often result in something like a segmentation fault. (Because WHY would THIS pointer be NULL?)
  • Diagram and flowchart between “modules.” It’s not easy drawing a flowchart with only one bubble. If your modularized code corresponds to bubbles, then as you split up your code, the relationships between your states and transitions make more sense as you begin to more closely map bubbles to individual steps in a more complex process.
  • Leave the door open by leaving things small. One of the things about modularizing and abstraction is that your designs can easily break. When a client comes to you and asks, “Hey, can you add this feature?” and you didn’t think about allowing for that in your abstraction system, suddenly, you’re stuck with an entire system that can’t bend over without breaking.
    • In this situation, maybe you’re left with a few choices: create a strong initial, flexible design; solidify and cement your design according to specifications or requirements after the point they can change; make the system easy to extend with a non-closed system (carefully guarded); or, maybe don’t take your system too seriously, and get ready to scrap and rebuild.
    • Perhaps the best idea would be to build from primitives small enough that changes can be done like how we might mix and match puzzle pieces–just take some out and shove other ones in. It’s much easier to work with small puzzle pieces then trying to small a large block of code, replace the certain parts that need to be changed, and hopefully put everything back together.

 

 

 

 

 

A bite on SAT and proving programs

What’s “SAT?” SAT is a shorthand for a well-known computer science program: the Boolean satisfiability problem. It’s simple: given a formula of Booleans like this:

(A' + B) \cdot (C \cdot D)

…can we figure out an assignment to all the variables to make the entire expression true?

Well, the answer is “yes:” All you have to do is try everything! But, er, that’s a bit much: if you have 4 variables, it’s 4! = 4*3*2*1 = 24 possible configurations of variables. Of course, factorials are fast-growing: 5! is 120 and 10! is 3,628,800.

The other fascinating thing about SAT is that different versions of it are P and NP. 2-SAT, which has product of sums of clauses with only 2 terms, is P while 3-SAT is considered NP. But why is 2-SAT NP? Because we can use the implication relationship that can be encoded as a pair of Booleans in order to determine this specifically because implication can be isolated in a 2-Boolean variable process.

SAT may have applications in other places. If we consider the variables to be, say, input conditions to your program, and the satisfying condition is your program crashing, being able to solve this problem in polynomial time might be helpful rather than having to solve it using an NP algorithm.

Attributes and static program traits

I recently came across this article about: Hotpatching a C function on x86. It’s a really cool article because it takes advantage of some compiler-specific and pretty low level stuff in order to do actual code patching on the popular x86-64 architecture.

But, if you take a look at some of the code:

__attribute__ ((ms_hook_prologue))
__attribute__ ((aligned(8)))
__attribute__ ((noinline))
__attribute__ ((noclone))
void
hello(void)
{
   __asm("");
   puts("hello");
}

Well, all those __attribute__ tags look kind of annoying.

Really–can’t there be a C-like language or some sort of meta-tool that can hook in at the language or lower level to cleanly encapsulate implementation vs. interface details?

This would also require meticulously surveying the semantics of a language and then translating into traits, but it would also give both compiler and language developers and programmers the degree of control over implementation details with clarity.

I’d imagine that maybe D or Rust might be able to do this sort of thing. Perhaps I can look into this.