Your browser (Internet Explorer 6) is out of date. It has known security flaws and may not display all features of this and other websites. Learn how to update your browser.
X
Text Layouting Status Report

Text Layouting Status Report

So, it took me two weeks to finally get around to a status report, and Peter even had to request it forcefully. But I have not been idle in this time. Let's see.

Pango? Harfbuzz? SDL_TTF!

I spent quite some time investigating Pango and Harfbuzz, I blogged about it in my previous post. I knew back then already that Pango was not ideal and that even Wesnoth was not using Pango for their fancy help rendering (they use if for some other types of text though). I then investigated Harfbuzz further and though it seems like it will one day become what we need, it is not there. In fact, it does not even have a stable version currently released. It can handle international text pretty well, but it does not help in the left-to-right, top-to-bottom rendering of text, nor with the breaking of text into paragraphs.

So we are back to square one: The old-and-trusted SDL_TTF. I designed my code around SDL as graphics engine, the text rendering is kinda plugged in and uses SDL_TTF, but it should be easy to replace this with any other text engine when one establishes itself as the one - be it Pango or Harfbuzz or anything else in the future.

What works?

I just finished my initial implementation of the engine - It has all the feature that Venatrix and myself have marked as desirable. Some of the feature (like tables) are rather rudimentary, but I think it should be enough for us. The whole engine is currently a stand alone library - in the next step, I will plug it into the Widelands code.

But let's move on to pretty pictures. Here are some examples.

<rt>
<sub padding=5 background=000000>
<sub width=350 background=win_bg.png padding=2>
<p align=center>
   <font size=30 color=F2DF91 underline=true>This is a heading</font>
</p>
<vspace gap=2>
<sub margin=5 padding=5 background="ffffff" float=left>
   <p><img src="wl-ico-128.png"></p>
   <p align=center>
      <font size=10>This is the subtext for the image. Kinda like a
      caption.</font>
   </p>
</sub>
<p indent=5>
<font color=FEFE00 size=14>
   Lorem ipsum dolor sit amet, consetetur sadipscing elitr, <font
   italic=true ref="hyperlink">sed diam nonumy</font> eirmod tempor <font
   underline=1 ref="somereference">invidunt ut labore et</font> dolore
   magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo
   <font bold=true>duo dolores et ea rebum</font>. Stet clita kasd
   gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
</font>
</p>
</sub>
</sub>
</rt>

This code renders to the following picture:

Sample image

This already shows off some of the features: changing fonts inside a paragraph, sub layouts that can float inside the text. Margins and paddings for some elements (though this is not as fully featured as in HTML) and of course images. It also contains a ref attribute for a font tag which makes this part of the text clickable. This will make hyperlinks in the help possible.

Tables are kinda possible as well:

<rt>
<sub width=10></sub><sub width=240>
<p>
<sub width=80 background=638c57><p>Hello</p></sub><space><sub
width=80 background=42deb7><p>Nice</p></sub><space><sub
width=80 background=43f447><p>World</p></sub>
<br>

<sub width=80 background=404542 valign=center><p
   align=center>Upsy</p></sub><space><sub
width=80 background=e21038><p>And more</p></sub><space><sub
width=80 background=0c7e8f valign=center><p>Thats all</p></sub>
<br>

</p>
</sub>
</rt>

Each row is a set of three fixed width cells separated by a <space> tag which expands as needed (in this case not at all). The lines are separated by a linebreak tag <br>. This kinda results in a table with fixed column count and width. Good enough for Widelands I hope:

Something like a table

Bullet point lists are possible as well:

<rt>
<p><font size=20 underline=1>A bullet list</font></p>
<sub padding=10 width=490>
<p>
<sub valign=top><p><space fill=" "><img
   src=bullet.png></p></sub><sub width=450>
<p>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor.
</p>
</sub>

<sub valign=top><p><space fill=" "><img
   src=bullet.png></p></sub><sub width=450>
<p>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet
clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit
amet.
</p>
</sub>

<sub valign=top><p><space fill=" "><img
   src=bullet.png></p></sub><sub width=450>
<p>
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor.
</p>
</sub>
</p>

</sub>
</rt>

This uses a lot of sub layouts to achieve the desired result. It does not read very nice, but which markup does? I plan to hide this all in some Lua functions, just as we are already doing for scenarios to make it all nice to use as well. This wall of text renders as

Bullet point list

What's next?

The next step is to integrate this layout engine into Widelands. Caching will be really important here, because rendering with SDL_TTF is very slow. I feel that the technical challenges of getting this engine on board will be surmountable, the one thing I am really not looking forward to is changing all rich texts that we already have to work with the new engine. Especially the hard coded texts inside the source, the files in txts/ and the early campaigns (barbarians and empire) will be a lot of tedious work. I hope some people hop on board to help me out with that.

Soo, expect a branch with this engine soonish on launchpad.

Widelands Marathon Begins

My personal Widelands Marathon

So, I finally handed in my PhD thesis. That does not mean that everything is said and done, there is still an oral examination, a 30 minutes talk and some demonstration stuff to be done. But it means that I am now officially out of my previous job.

That means, before I will start at Google in September, I have around three months to use for whatever I see fit (okay, okay, I also need to find a flat, move, prepare for the oral examination, prepare the demo, and organize a huge celebration. And I have some other obligations remaining). I want to use a lot of time for Widelands in these three months. To keep me on track and motivated, I decided to keep a development diary on this blog.

First Project: Make the Widelands Help awesome

Venatrix, Astuur and others have begun the immense project to write a help for Widelands. I have provided a minimal framework to give buildings a help window via a Lua script in the building's directory. This is already immensely useful and significantly better than no help at all, but it feels rather limited: Widelands text handling and layouting capabilities are rather limited and there is currently no way to document other stuff than buildings. My first project for now is to improve this situation.

Text layouting

Text layouting is a difficult problem, especially when we also consider internationalization (just think chinese for example). I started out by learning the various libraries that are around for this problem. There are two contenders, the fist is Pango, the second harfbuzz.

Pango

Pango is the text layouting engine from the Gnome desktop environment. And it is well suited for this task. Using Cairo for rendering and its own (limited) markup language it is next to trivial to render a UTF-8 string with various languages to a rectangular grid. This is not enough for what I want though: Just think about an image inside a paragraph. Text should flow around it, therefore it needs to be broken into a non rectangular shape. This is not easy with Pango: It does provide a low level API to reflow text independently, but it is awkward and one looses at lot of what Pango is good at (like transparent handling of right-to-left text blocks).

Also, Pango does not allow to choose the TTF font file to use for rendering directly, instead relying on fontconfig to find a suitable system font. In Widelands, we would prefer to feed in our font files ourselves - so we can be sure that it looks the same, no matter what OS is used.

Pango is stable and used in a lot of products like Gnome. It is also what Wesnoth uses for most of its rendering needs. But for the (cool) Wesnoth help, it is in fact not used, likely because of the reflow problem I mentioned earlier.

Documentation for Pango is available in the form of a Doxygen output. This is a nice reference, but hardly more useful than the source code itself. There is also no easy tutorial that goes beyond the PangoLayout example that can only render in rectangles. Especially, there is no example how to add custom markup tags or how to do low(er) level layouting. The best I found is an old mailing list post that explains the steps needed to layout text oneself. I was able to stitch those hints together to a binary that does just that. But it is really awkward and it does not offer a lot of the functionality that the PangoLayout wrapper offers. I kinda feel this is a dead-end for now.

Harfbuzz

Harfbuzz is a slightly lower level API that is in fact used by Pango. It allows to load TTF directly and offers more control over layouting. For rendering it also uses Cairo. Problem here is that there is no stable version - according to the homepage, the API might change till 1.0, the current version is 0.6. And there is no docu. Nada!

Luckily, there is a github user that provided an example that is very close to what I want to achieve and that shows off a lot of the features of harfbuzz. The API changes were already felt in this one year old example: some of the functions take other arguments now than they did back then, but it was really easy to figure out with the harfbuzz source code nearby. When run, the program shows the following window:

text

I am yet unaware if harfbuzz is also able to break text according to grammatic rules, but this looks promising.

Alright, so far for my two hours of Widelands for today.

Fourth episode of UltiSnips Screencast

UltiSnips Screencast Episode 4

I got around doing the final episode of my little screencast series on UltiSnips. This time I talk about Python interpolation and give an example of a snippet that really makes my life easier.

I always wanted to have a few screencasts showing off the main features of UltiSnips but nobody was willing to take the time and effort to produce them. Now that I've done it myself I can understand why: It takes a lot of time. From start to finish - planing, recording, cutting, uploading and promoting one screencast of roughly ten minutes takes approximately six to eight hours. It was a lot of fun doing the casts though and I learned a lot. I am also reasonably pleased with the final results.

If someone suggest a really good topic for a fifth screencast, I will consider making one. But I feel that UltiSnips' main features are well covered now. Thanks for sticking with me throughout the series!

Third episode of UltiSnips Screencast

UltiSnips 2.0 and Screencast Episode 3

I just release UltiSnips 2.0 together with a short screencast to run you through the new features.

2.0 is a killer-feature release for me: Having normal mode editing and $VISUAL support adds a lot of value to UltiSnips for me.

UltiSnips now has all the features I initially expected from a snippet plugin when I started out writing it. It was not an easy road getting here but I am proud what we have achieved. I consider it now mostly feature complete - I will not object to good ideas for new features that enhance the user experience but we will try as best as we can to avoid feature creep. UltiSnips wants to be only one thing - the ultimate solution for snippets in Vim.

The Statistical View on Least Squares

Least Squares and Statistics

Last time I talked about the beautiful linear Algebra interpretation of Least Squares which adds a wonderful geometric interpretation to the analytical footwork we had to do before. This time I will borrow from a video I saw on Khan Academy a long time ago. I connects some of the elemental principles of statistics with regression.

The basic idea is similar to the analytical approach. But this time, we will only try to fit a line to / pairs. So, we have a model of the form

The squared errors between model and data is then

Of course, we search for the global minima of this solution with respect to the parameters and . So let's find the derivations:

We can conveniently drop the 2 that appears in front of all terms. We will now rewrite these equations by using the definition of the sample mean:

This gives:

Let's loose the s and solve for and .

If you look closely at the term for the slope you see that this is actually just the covariance of and divided by the variance of . I find this very intriguing.

Second episode of UltiSnips Screencast

UltiSnips Screencast Episode 2

Second screencast is done and ready! If you have not yet seen the first, look over here. This was done under the maxim Done is better than perfect. I had more advanced explanations planed, but the screencast turned out to be longer than expected. I therefore opted for a complete overview over the most common features of UltiSnips and will explain some cooler stuff in this blog post.

So here is the video. You can find some snippets with their explanation below.

Elements inside of default text

Placeholders can contain other UltiSnips elements. This is totally cool and very useful. For example, the following snippet is defined for the html file type and ships with UltiSnips:

snippet a
<a href="$1"${2: class="${3:link}"}>
    $0
</a>
endsnippet

It allows to either delete the complete tabstop 2 with one keystroke. Or you can jump to tabstop 3 and replace the CSS class name.

Another example:

snippet date
${2:${1:`date +%d`}.`date +%m`}.`date +%Y`
endsnippet

This snippet will enter the current date in the format that is used in Germany: dd.mm.yyyy. Jumping forward will allow you to either overwrite the day or the day and the month. I use this in my todo files to enter due dates.

Complete Scripts in Shell Interpolation

Shell interpolation is as powerful as the shell itself. Here is a silly example that is basically a complete Perl script:

snippet random
`#!/usr/bin/perl
use LWP::Simple qw(get);
my $page = get "http://www.random.org/integers/?num=1&min=1&max=100&col=1&base=10&format=plain&rnd=new";
print $page`
endsnippet

This snippet will insert a truly random number between 1 and 100 fetched from random.org. Useful? Maybe not, but a good example.

Transformations are more powerful than you might think

This transformation shows the transformation option g which means global replace. It will uppercase every word in the first tabstop while mirroring its content.

snippet title "Titelize in the Transformation"
${1:a text}
${1/\w+\s*/\u$0/g}
endsnippet

In the replacement part of this transformation, the u means "uppercase the next character" and the $0 matches everything found. The complete transformation basically says: find a word \w+ and any number of whitespace \s* and replace them through what you found, but uppercase one char. The g at the end makes sure that not only the first but all words are uppercased.

Conditional replacements

One feature that I didn't mention in the screencast are conditional replacements. When you capture a group in the search part of a replacement, you can check if it was matched in the replacement like so: (?<number>:yes text:no text). yes text will be inserted if the group was matched, otherwise no text will be inserted. A quick example:

snippet cond
${1:some_text}${1/(o)|(t)|..*/(?1:ne)(?2:wo)/}
endsnippet

The transformation will match a o at the beginning into group 1 or a t at the beginning in group 2. The rest of the search string just matches everything of the first tabstop. The replacement will either be ne when a o was matched, wo when a t was matched or an empty string. That means that you can just type a t to quickly get two as the complete content of the snippet or a o to get one. If you type anything else, you only get the content of the place holder, i.e. the verbatim of what you typed.

Inword triggers

The snippet option 'i' stands for inword trigger. Let's say for some coding project, I often want to get variables that end on izer: maperizer, vectorizer... I want to define a snippet with ri which should expand to izer. Triggers are always matched to full words though - except for when you use the i option for your snippet:

snippet ri "rizer" i
rizer
endsnippet

This does what I want: maperi<tab> -> maperizer.

Regular Expression triggers

Regular expression triggers can be used with the r option to a snippet. Note that you have to add delimiters around your trigger when using regular expression triggers. I use the following snippet because I can never remember on which exact trigger I set my chapter snippet:

snippet "chap?t?e?r?" "Latex chapter" rb
\section{chapter}
   $0
\end{chapter}
endsnippet

This will expand on either cha, chap, chapt, chapte or chapter. Regular Expression triggers are even more powerful because you can get access the to match object inside your python interpolation blocks. More on that in the next episode.

Regular expression triggers also allow to define triggers with special characters, whitespace or multi word triggers.

snippet " this is a trigger" "Multiword & Whitespace" r
Hello World
endsnippet

Conclusion

The features we discussed so far are enough to make crazy snippets and will suffice for most uses. If you ever feel lost or forget how to use one of the features, take a look into UltiSnips help document, either via

:help UltiSnips

or here online.

Next time we will discuss python interpolation which brings even more power to snippets. With it, some of the features we discussed so far will become easier to read with and others will only then become possible. With python interpolation there is basically nothing that a snippet can't do.

As always, leave comments and let me know what you think about the screencast and the tips. Also let me know what you are interested in and what you would like to see in future screencasts. I have no more topics for a fourth screencast, so if you want one, let me know what you want to see in it.

Move window done right

Move window to the rescue

Overlapping windows are a bad invention. Frankly, the one UI that works are virtual desktops with non overlapping windows (e.g. window managers). Given, the world becomes sane slowly with iOS leading the way: every application is full screen there.

On Mac OS X, the situation is dire: windows just open wherever they want to. I wrote a python script a while ago that can be called quickly from Quicksilver or LaunchBar which took a simple syntax to describe where you want your window and moved it there. Problem: it was rather slow. It needed to run and parse a command line tool to get number of screens and resolutions and it had to ask for the frontmost application via AppleScript.

I got fed up with that yesterday and made it fast. Getting the frontmost application and the number of screens and their resolution is now done via the Carbon Framework in a C Extension. You can find the source code on GitHub, download from there and use pip to install:

https://github.com/SirVer/move_window

Usage

You want to call this from LaunchBar or Quicksilver, so install/make a proper AppleScript (I ship one in contrib). The tool takes the following syntax:

move_window <screen_id><nr of x parts><x span><nr of y parts><y span>

Some examples

move_window 0     # Disp 0, full Screen
move_window 02    # Disp 0, left half of screen
move_window 030   # Disp 0, left third of screen
move_window 031-2 # Disp 0, rightmost two thirds of screen
move_window 12121 # Disp 1, bottom right quarter of screen

Feedback and pull request are very welcome!

First episode of UltiSnips Screencast

UltiSnips Screencast Episode 1

I finally got the technicalities sorted out, some inspiration and the time to make a first screencast about UltiSnips. It is not so much about the specifics in UltiSnips but rather an overview what snippets are and a demo reel of some of UltiSnips features.

Let me know in the comments what you would like to see in future episodes!

Update: There is now a second screencast.

The Linear Algebra View on Least Squares

A Linear Algebra View on Least Squares

Last time, we introduce the least squares problem and took a direct approach to solving it. Basically we just minimized the -distance of a model to the data by varying the parameters of the model. Today, we find a completely different, but very neat interpretation.

Suppose we have the following problem: There is a set of linear equations without a solution, e.q. a matrix and a vector so that no vector exists that would solve the equation.

Bummer. What does that actually mean though? If we use every possible value of in this equation, we get essentially all linear combinations of the column vectors of . That is a span of vectors. That is a subspace. So, is not in our subspace, otherwise we would find an that would solve our linear equation.

Here is a visualization in 3D when A is a two dimensional matrix:

2D Matrix with Vector not in its Column Space

The column space of the matrix is a plane and is not in the plane. A natural question now is: what is the vector in the column space of that is the best approximation to . But what is a good approximation? Well, how about the Euclidian distance between the endpoints of the vectors. If we choose this criteria, there is a simple solution: the projection of to will give the best approximation. Here is another image:

Image with projected vectors

We have decomposed into its best approximation in which we call and a component that is orthogonal to it called . At the end of the day, we are interested in a best approximated solution to our original equation, so we are not really interested in finding , but rather an so that

Lets look for an equation for this . We will start out by using that the two components of are perpendicular to each other. And we will also express through vector addition.

Since can't be zero, the term in parenthesis must be. This directly yields the least squares formula again:

Conclusion

We did the exact same thing as in the last post, but this time we took another approach and quite a different world view on the problem. The least squares solution came up very naturally here and with a very simple geometric interpretation (at least in the 3d case, the approach is also valid for other dimensions though).

The last iteration of this series will take a look at least squares in statistical models.

A direct look on least squares

Least squares

Here we go again. After a long time of travelling and illness something new on this blog. Today, I start a new mini series that takes a look at Least Squares in various ways. I want to show how least squares arises naturally in various areas - of course this will look at little similar all the time because it stays the same principle; but the basic approach depends on the context.

Today we start with a simple example and the method I'll call the direct approach.

The direct method

Suppose we have some kind of measurement process that measures every second and we know that the system we measure follows a model of third order, that is the values we expect are

But obviously, our measurement process is in some way noisy. Our aim is now to find good values for and . But what is good? Well, why not start by minimizing the distance between our model prediction and the measurement values?

First plot showing a bad model fit and the errors we want to reduce.

In this plot we see our measurements as blue dots and we see an awful model fit as red line. And we see various bars. Those are the errors we want to minimize, the model prediction at a point : minus the value that we measured at this point :

It seems like a bit of a stretch to write this simple equation in a matrix notation there in the last step. But it will benefit us in a moment.

We do not want to minimize the straight sum of those because positive and negative values might zero out. The absolute value is also not so easy because our error function is then non-smooth... Let's summarize the sum of squares of the individual errors:

Minimization is best when we have a quadratic error function because there is only a single global minimum which can be found by setting the gradient to zero.

The notation of this minimization becomes a bit tedious, that's why we go back to the matrix notation we used earlier and realize that we can stack all measurements we have into one matrix. Working with vectors will make the derivatives easier to do:

Our error function then becomes

Finding the gradient of S with respect to the parameter vector is now easy. We also immediately set this equal to the zero vector to find the minimum of this function:

And so we find the parameters that minimizes the error to be

Using this with the above example we get the following best fit

Best fit for the model

Conclusion

I call this the direct approach because we directly set out to minimize the quantity and the least square formula pops out. There are other goals one can set out to achieve which leads to the same formula which I will show in some of the next posts. Hopefully it will not take as long again till I find some time to post again.

The bridge between Lagrange and Bezier interpolation

Comparing Lagrange and Bezier interpolation

Let's continue our discoveries in Chapter 7 of the Book by discussing my solution to 7.P1.

We have talked the de-Casteljau algorithm before. It is an expensive but numerically stable way to calculate the points on a Bézier curve given it's control Polgon. Just to remember, this is how it is calculated:

Given the points , . Set

Then is the point with the parameter on the corresponding Beziér curve.

Another common approach to interpolation is to use Polynoms of degree . We than also need some that corresponds to the given , if they aren't given we can just set them so that they are equally distributed betwenn .

Aitken's Algorithm is than defined by

Given the points , with corresponding and the position of the interpolation . Set

Then is the point with the parameter on the corresponding interpolating Polynom.

The geometric interpretation is the following: We start out with point 0 and 1 and interpolate linearly between them, than we take 1 and 2 and so on. In the next step, we linearly interpolate between the linear interpolations of the last round by blending the one into the other. The degree of the interpolation polynom increses in each step by one.

The differences between de-Casteljau and Aitken's algorithm in each step are quite simple: The first one is a mapping from while the second one is mapping from .

We can provide a straight forward combination of those two algorithms by mapping from the interval with an , that is, we linearly interpolate between the two intervals. Let's define a quick helper function:

Using this, the generalized algorithm looks like this

Given the points , with corresponding , a blending factor and the position of the interpolation . Set

Then is the point with the parameter on the corresponding blend between Bézier and Polynomial interpolation.

The De-Casteljau algorithm is the special case for , Aitken's Algorithm is the special case for . Here is a plot of a planar interpolation:

Blending between Lagrange and Bezier with various values of alpha

See how the curve for passes through all 4 points. This is a property of Lagrange interpolation (polynomial interpolation). The curve for is the only one that stays in the convex hull of the points; this is a property of Bézier curves.

As always, you can find the complete code for this post in the GitHub repository under chap07. Also, Farin has written a paper on this very topic. You can find it here.

Better Navigation

Better Navigation on this Blog

I just added a calendar widget on each page where it makes sense here. I am not sure how useful it is, so I will try it out for a while and remove it again if it makes no sense.

I also improved the blog archive list in the right navigation bar. I was inspired by something I saw on Guido's blog, namely a folding tree view. I am aware that most blogging tools like Wordpress have this feature build in, but I realized the first time how useful it is on Guido's page. So I copied it. Turned out that you only need a little bit of Javascript and CSS. I basically followed the short but concise tutorial by David Herron. I only needed very few adjustments, but they were trivial thanks to the beauty that is jQuery.

Degree Reduction of Bezier Curves

Degree Reduction of Bezier Curves

Continuing with my reading efforts in Curves and Surfaces for GACD by Gerald Farin, I will talk today about a topic that comes up in chapter 6: elevating and reducing the degree of a Bézier curve.

Elevating the dimensionality

First, we start out by proving that the degree of a Bézier curve can always be elevated without the curve being changed.

A curve with the control polygon can be represented as a Bézier curve using the Bernstein Polynomials as basis:

We will now prove that we can increase the degree from to and still keep the same curve, given we choose then new control polygon accordingly.

We start out by rewriting as

And now we show that the terms can be rewritten as a sum from 0 to n+1. Let's start with the first term:

Easy. Now the second one

Now, we plug this into the summation formula. We can readjust the summation variables cleverly by noting that we only add 0 terms and we get the formula:

Now all there is to do is to compare the coefficients for old control polygon points and new polygon points and we have our solution: the new points are linear interpolations of the old ones. We can find the new points from the old by a simple Matrix multiplication:

with the Matrix being of shape and looking something like this:

Degree Reduction

The hard work is already done now: It only remains to say that we indeed find the best approximation to our Bézier curve (in the least squares sense) by finding the least-squares solution to the inverse of the system of equations given in the Matrix equation. Therefore, we find the best degree reduced control polygon by

And here is an example from inside Blender. The highlighted objects are the reduced control polygon (n=4) and its Bézier curve. The non highlighted are the originals with one degree higher (5).

Reduced control Polygon and original + splines

As always, you can find the source code for this in the corresponding GitHub project.

Generators in C++

Generators in C++

Generators are iterators over things made up on runtime. For example if you ask me about the Fibonacci sequence, I can generate an infinite amount of them by only remembering the two numbers before:

Generators are quite useful for programming. Python has them, Haskell is basically build around them (I think they call them infinite lists). Generating the Fibonacci sequence up to a maximum number maxn looks like this in Python:

def fib(maxn):
    n, ln = 1, 1
    while n < maxn:
        yield n
        n, ln = ln, n + ln

for i in fib(1000):
    print i,

After my last blog post about C++0x features, I was intrigued to write something similar in C++. The corresponding main function also looks very nice with the new container iterate feature:

#include <iostream>

int main(int, char argv**)
{
   FibonacciGenerator k(1000);

   for(auto i : k)
      std::cout << i << " ";
   std::cout << std::endl;

   return 0;
}

So basically, we have a container here called FibonacciGenerator and we iterate over it with the new container iterate feature. For this to work, we have to implement the begin() and end() functions in the container and the values those functions return must implement operator++, operator* and operator!=. This goes something like this:

class FibonacciGenerator {
private:
   struct generator {
      generator(int n) : _n {n}, _ln{0} {}

      int & operator*() {
         return _n;
      }

      bool operator!=(const generator & other) const {
         return _n <= other._n;
      }

      generator& operator++() {
         int temp = _ln + _n;
         _ln = _n;
         _n = temp;
         return *this;
      }

      private:
         int _n, _ln;
   };

public:
   FibonacciGenerator(int maxn) : _maxn{maxn} {}
   generator begin() const {
      return generator(1);
   }
   generator end() const {
      return generator(_maxn);
   }

private:
   int _maxn;
};

The implementation is quite straightforward: All the (not very heavy) lifting is done in the generator. The inequality compare checks if the current number is bigger than the one of the other iterator. If this becomes true, the loop in main will terminate. The dereference operator returns the current number and the increment operator calculates the next. The container itself only contains the begin and end function which are called by the container iterate syntactic sugar.

Note that generators are nothing new. You can write them in exactly the same way in C++03. But somehow, the new Pythonic container iterate feature triggered me for the first time to think about them in C++.

Some fancy new features of C++11

C++ 2011 bag of features

The now finalized new standard for the C++ language brings a bag of new and exciting features to C++. It will reduce verboseness and increase performance in a lot of areas. Of course, it has to stay backwards compatible; that is why some of the bigger and uglier flaws of C++ can't be changed. Still, it improves the language dramatically. Here is what I will enjoy the most.

Auto and container iterate

Following code annoys me:

std::vector<int> a;
for (std::vector<int>::iterator i = a.begin(); i != a.end(); ++i)
   cout << (*i) << endl;

It raises the following question: if I know that a is a vector and I know that a.begin() returns an iterator, why do I have to retell this to the compiler? It knows this as well. Luckily, it is no longer needed:

for (auto i = a.begin(); i != a.end(); ++i)
   cout << (*i) << endl;

Nearly a pleasure to read. For even shorter code, I would have used boost::foreach in the past, but this is now also included into the language. And it's syntax is pretty:

for (auto v : a)
   cout << v << endl;

This will print all values inside the container a. Since a is a vector of int s, v will be an int as well.

Initializer lists

C++ allows now to initialize even complex types with initializer lists similar to how C did things. This will also replace constructor syntax with brackets for the most parts and it is a good thing, because initializations look more like initializations and not like function calls. The following is valid in C++0x:

std::vector<int> a = { 1, 2, 3, 4, 5 };

You no longer have to mess around with push_back all over the place.

Putting the features together, we can write nice non-verbose code. The following complete C++ program prints the contents of a std::map.

#include <iostream>
#include <map>

using namespace std;

int main(int argc, char const *argv[]) {
   map<string,int> cmap {
      {"Hello", 5},
      {"World", 10}
   };

   for (auto k : cmap)
      cout << k.first << ":" << k.second << endl;

   return 0;
}

static_assert and variadic templates

static_assert can be used to write asserts on compile time. Widelands uses this in some places to make sure that space optimization that assume some sizes of datatypes do not break down. In the past, you had to jump through some loops to get this working. Now it is as easy as

static_assert(sizeof(long) == 8, "Your long, it is too small!");

Variadic templates are templates that can take a variable number of arguments. They are used to implement std::tuple for example.

But they can also be used to implement self recursive templates that can consume input one element at a time at compile time. For example the following program does precisely nothing at run-time but converts a binary string to an int at compile time:

template <char... digits> struct _2bin;

template <char high, char... digits> struct _2bin<high, digits...> {
   static_assert(high == '0' or high == '1', "no binary number!");
   static int const value = (high - '0') * (1 << sizeof...(digits)) +
         _2bin<digits...>::value;
};

template <char high> struct _2bin<high> {
   static_assert(high == '0' or high == '1', "no binary number!");
   static int const value = high - '0';
};

int main(int argc, char const *argv[])
{
   const int a = _2bin<'1', '1', '1', '1', '1', '0'>::value;
   static_assert(a == 62, "Didn't work out :(");

   return 0;
}

Extendable literals

This is still not supported in gcc 4.6, but the standard allows to create an operator for literal suffixes. The following is allowed by the standard but does not compile currently:

template<char... digits> constexpr int operator "" _b() {
    return _2bin<digits...>::value;
}

int main(int argc, char const *argv[])
{
   const int a = 111110_b;
   static_assert(a == 62, "Didn't work out :(");

   return 0;
}

Note the constexpr. It tells the compiler that it can evaluate this function on compile time and so it will. The operator "" syntax defines a new literal suffix, _b in this case. It allows to write a literal as seen in the definition of a. Like all operators in C++, this also basically adds syntactic sugar which makes code easier to read.

Extendable literals are a fun to play around with and allow for a lot of domain-specific gravy in C++ code without run-time overhead. Still, it is more of an esoteric new feature of the language. For the specific problem discussed here - binary literals - you are better off using a compiler extension. Most compilers support binary literals in the form of 0b111110 these days.

Conclusion

C++0x brings a bag of nice features. All of the above except for extendable literals are already supported in GCC 4.6, but you need to pass -std=c++0x as argument on the command line while compiling.

If you want to know more, Wikipedia has a detailed rundown of the new features in C++0x. There are some that I did not discuss here and that I am not so exited about as the ones mentioned.

I am looking forward to seeing all those features become more widespread. As far as I am concerned, I will only program C++0x in the future if I have the choice and I will not weep for superseded features that will slowly vanish into oblivion. Altogether, I feel like the standardization committee did a great job with the new standard, it breathes new live into a language that we are stuck with for years or maybe decades to come.