How to Receive Sourcecode Location From Flex/Bison

I finally found a page that contained working code that returns the lexer position in the source code in case of an error. It was damn hard to find any good information, mostly because Flex and Bison are two tools that are typical for C programs of 1970: shitty.

I copy the page here in case the original page goes down



 

Advanced Use of Flex

In this section we will develop a scanner for arithmetics, which will later be used together with a Bison generated parser to implement an alternative implementation of M4’s eval builtin, see Bison, ylparse.y (FIXME: Ref Bison, ylparse.y.). Our project is composed of:

yleval.h a header common to all the files,
ylscan.l the scanner for arithmetics
ylparse.y the parser for arithmetics (FIXME: ref.).
yleval.c the driver for the whole module (FIXME: ref.).

Because locations are extremely important in error messages, we will look for absolute preciseness: we will not only track the line and column where a token starts, but also where it ends. Maintaining them by hand is tedious and error prone, so we will insert actions at appropriate places for Flex to maintain them for us. We will rely on Bison’s notion of location:

typedef struct yyltype
{
  int first_line, first_column, last_line, last_column;
} yyltype;

which we will handle thanks to the following macros:

LOCATION_RESET (location) Macro
Initialize the location: first and last cursor are set to the first line, first column.
LOCATION_LINE (location, num) Macro
Advance the end cursor of num lines, and of course reset its column. A macro LOCATION_COLUMN is less needed, since it would consist simply in increasing the last_column member.
LOCATION_STEP (location) Macro
Move the start cursor to the end cursor. This is used when we read a new token. For instance, denoting the start cursor S and the end cursor E, we move from

      1000 + 1000
      ^  ^
      S  E

to

      1000 + 1000
         ^
        S=E
LOCATION_PRINT (file, location) Macro
Output a human readable representation of the location to the stream file. This hairy macro aims at providing simple locations by factoring common parts: if the start and end cursors are on two different lines, it produces 1.1-2.3; otherwise if the location is wider than a single character it produces 1.1-3, and finally, if the location designates a single character, it results in 1.1.

Their code is part of yleval.h:

/* Initialize LOC. */
# define LOCATION_RESET(Loc)                  \
  (Loc).first_column = (Loc).first_line = 1;  \
  (Loc).last_column =  (Loc).last_line = 1;

/* Advance of NUM lines. */
# define LOCATION_LINES(Loc, Num)             \
  (Loc).last_column = 1;                      \
  (Loc).last_line += Num;

/* Restart: move the first cursor to the last position. */
# define LOCATION_STEP(Loc)                   \
  (Loc).first_column = (Loc).last_column;     \
  (Loc).first_line = (Loc).last_line;

/* Output LOC on the stream OUT. */
# define LOCATION_PRINT(Out, Loc)                               \
  if ((Loc).first_line != (Loc).last_line)                      \
    fprintf (Out, "%d.%d-%d.%d",                                \
             (Loc).first_line, (Loc).first_column,              \
             (Loc).last_line, (Loc).last_column - 1);           \
  else if ((Loc).first_column < (Loc).last_column - 1)          \
    fprintf (Out, "%d.%d-%d", (Loc).first_line,                 \
             (Loc).first_column, (Loc).last_column - 1);        \
  else                                                          \
    fprintf (Out, "%d.%d", (Loc).first_line, (Loc).first_column)

Example 6.14: yleval.h (i) -- Handling Locations

 

Because we want to remain in the yleval_ name space, we will use %option prefix, but this will also rename the output file. Because we use Automake which expects flex to behave like Lex, we use %option outfile to restore the Lex behavior.

%option debug nodefault noyywrap nounput
%option prefix="yleval_" outfile="lex.yy.c"

%{
#if HAVE_CONFIG_H
#  include <config.h>
#endif
#include <m4module.h>
#include "yleval.h"
#include "ylparse.h"
Example 6.15: ylscan.l -- Scanning Arithmetics

 

Our strategy to track locations is simple, see Flex Actions. Each time yylex is invoked, we move the first cursor to the last position thanks to the user-yylex-prologue. Each time a rule is matched, we advance the ending cursor of yyleng characters, except for the rule matching a new line. This is performed thanks to YY_USER_ACTION. Each time we read insignificant characters, such as white spaces, we also move the first cursor to the latest position. This is done in the regular actions:

/* Each time we match a string, move the end cursor to its end. */
#define YY_USER_ACTION  yylloc->last_column += yyleng;
%}
%%
%{
  /* At each yylex invocation, mark the current position as the
     start of the next token.  */
  LOCATION_STEP (*yylloc);
%}
  /*  Skip the blanks, i.e., let the first cursor pass over them.  */
[\t ]+     LOCATION_STEP (*yylloc);
\n+        LOCATION_LINES (*yylloc, yyleng); LOCATION_STEP (*yylloc);

The case of the keywords is straightforward and boring:

"+"        return PLUS;
"-"        return MINUS;
"*"        return TIMES;
...

Integers are more interesting: we use strtol to convert a string of digits into an integer. The result is stored into the member number of the variable yylval, provided by Bison via ylparse.h. We support four syntaxes: 10 is decimal (equal to… 10), 0b10 is binary (2), 010 is octal (8), and 0x10 is hexadecimal (16). Notice the risk of reading 010 as a decimal number with the naive pattern [0-9]+; you can either improve the regular expression, or rely on the order of the rules1. We chose the latter.

  /* Binary numbers. */
0b[01]+   yylval->number = strtol (yytext + 2, NULL, 2); return NUMBER;
  /* Octal numbers. */
0[0-7]+   yylval->number = strtol (yytext + 1, NULL, 8); return NUMBER;
  /* Decimal numbers. */
[0-9]+    yylval->number = strtol (yytext, NULL, 10); return NUMBER;
  /* Hexadecimal numbers. */
0x[:xdigit:]+ yylval->number = strtol (yytext + 2, NULL, 16); return NUMBER;

Finally, we include a catch-all rule for invalid characters: report an error but do not return any token. In other words, invalid characters are neutralized by the scanner:

  /* Catch all the alien characters. */
.   {
      yleval_error (yycontrol, yylloc, "invalid character: %c", *yytext);
      LOCATION_STEP(*yylloc);
    }
%%

where yleval_error is a variadic function (as is fprintf) and yycontrol a variable that will be both defined later.

This scanner is complete, it merely lacks its partner: the parser. But this is yet another chapter…

The Compiler Builder Toolkit

There is an Open Source project out there that aspires to replace GCC as the predominant Open Source compiler system. And while that is still along way down the road, their C++ compiler Clang is already very good and the whole toolkit is used extensively i Apple system software.

What’s good about this is that the Intermediate Represenatino (IR) a sort of virtual RISC processor, is well documented and allows even people with little experience to build their own compiler. Although it will probably not be competing with Visual C++ anytime soon, it’s a good way to learn about compilers.

If you’re into this sort of stuff, check it out.

 

Another Fatality

I just came across this article here: http://www.drdobbs.com/architecture-and-design/farewell-dr-dobbs/240169421

This was kind of a shock to me but it fits right into the tenor of my last post. Although in Germany we did not read Dr.Dobb’s Journal alot, the fate of many German IT journals was the same, even before the advent of the internet to the level we have now. Magazines became nothing but test publications with the eternal repetition of “the 10 best printers” etc. They adjusted earlier to a shifting demand I believe. the kind of people who dealt with computers in the 80’s and 90’s were more the active type, partly because there wasn’t much software and people had to program in order to do anything with their machines.

With the advent of more software came the decline in user qualification. More and more people used computers who didn’t even know how such a machine wors and consequently the IT magazines catered to this clientele. So there are a bunch of those magazines still out there, but they are all of flat content. Nothing of the like of Dr.Dobb’s Journal or, in Germany, c’t Magazine. Which is still out there but has turned in one of those simplified magazines.

Times change and not for the better, that’s for sure. When I hear programmers talk these days, I have the feeling they live in a different world. It’s all about abstract concepts, meta language features, internet connectivity, and yes “the cloud” (whatever that is). It’s all hot air. Do applications get better in these days of “esoteric IT specialists”? Not that I see it. There is nothing I would do in the latest version of MS Office that I could not do in Office 2003.

When you read about the latest features of the C++ programming language it sounds like Jean Paul Sartre discussing french expressionism in french. I have no idea who even needs all this. And yet it is so complicated that it has the potential to break applications by means of subtle programming mistakes that may take forever to fix. That’s why I stay away from most of this stuff and only use a few of the new features.

Well, whatever, time moves on, but to me it looks more and more like a “The Emperor Has No cloths” effect.

 

Where have all the applications gone?

Sometimes it feels strange.

You find a blog site or something that interests you, and which is a year or two old, in a search engine and go there and read and you think, now that is nice, I’m going to give it a try and you click on the link and you get a site that reads something like this :

Meebo was acquired by Google on June 4, 2012.

Our team is now working just a few miles down the road with the Google+ team, where we continue to focus on creating delightful experiences for our users, developers, and publishers.

Meebo’s products are no longer available, but you can find all the latest and greatest things that we’re working on at https://developers.google.com/+/.

 

What the fuck?

That made me think about all the old timers, starting in the 1990s when there were tons of awesome games by tons of small companies. Hell, I was working for one of those myself, called Blue Byte in Germany. We didn’t get rich (well except some of use who were clever enough to jump ship with the right stuff in their pockets), but we had a lot of fun. New ideas were flying around, some even made it into new games, of course there was bitching and bickering, everybody wanted to be the blessed one who’s ideas were realized in the next game and so on. And then the company was sold to Ubi Soft, the owner disappeared with (rumored) about 12 million dollars and went to America where he set up an airport catering service. Well, can’t blame him for taking the payout over the daily sorrows.

But what became of Blue Byte? They are now the Ubi Soft placeholder in Germany, doing translations, publishing, and continue to shell out the latest version of the one big money spinner Blue Byte had: The Settlers. What version are they at now? 5? 12? I have no idea, but in general it’s a C&C like game and it went like this:

Version 1: to make a soldiers you need a sword and a shield

Version 2: to make a soldier you need a sword, a shield and a helmet.

Version 3: To make a soldier you need a sword, a shield, a helmet and a bag of provisions.

and so on ad nauseam. Now, I’m not trying to bash them, everybody wants to make money and if the population is so stupid as to buy the next version when it is only marginally better than the previous one, all the better for them. And they are not alone. Many, many, many companies have gone the way of the dodo. Origin, my favorite developer, is now a zombie of Electronic Arts, so is Bullfrog and many others. Sony has swallowed a lot too.

I guess it all went so that they had some good success, and then the big guys came and said, well we can either destroy them, which cost money and time and is uncertain, or we offer the the owner to buy the company and he can retire to an island, then we close it down or keep it alive as a brand. And who could resist if somebody offered them some millions and you don’t have to worry anymore how you pay the next month’s salary? So again, I can’t blame those people that sold their companies.

Fast forward to 2014.

What is the games market situation now? I have NO clue. And why is that? Because about 10 years ago I stopped browsing the game aisles at my local Fry’s store or stop to visit a game store in the mall. It’s all the same. Shoot Em Up Version 132 with better graphics, more explosions and harder monsters. Or dumbfucker games where you shove little candies or bubbles back and forth. Or multiplayer games where you can have even bigger raids than in the previous one. Did I forget any genre? I don’t think so because most gaming genres are dead today. I can’t remember when I saw the last C&C like game or the last adventure or the last flight simulator or the last space fight simulator or the last car racing game other than Mario.

And for the rest of us, it’s either buying the menure that the big companies feed to us or shut up and play some retro games.

And the same is happening in the OS and application world. There are no alternatives on the desktop market to Windows or Mac OS. Sure Linux is there but it is still difficult to handle and one of today’s typical buyers who don’t even know if RAM is measured in Gigabytes or Square Werst Per Glasen surely can’t handle Linux. And when MS or Apple or Google in the mobile market demand that you hand over your privacy to them, what are you going to do? Just say no? Yea right, you people can’t even live 5 minutes without your stupid devices and without texting while driving. Hell, at the airport I saw an idiot in a toilet directing the flow of piss with one hand while texting with the other!

So with most of OS and application development in a few big hands what are the alternatives? I’m afraid there are none. I know I will try to use Linux as much as possible, but there are limits. I will have to use Microsoft’s Visual Studio for development (again there are no other compilers anymore because MS destroyed all the small manufacturers) and it only runs on guess what OS? And in order to use the latest version you have to use the latest Windows version simply because they made it so that it refuses to run on an older version.

It’s a brave new IT world and I don’t think it is going to be a pleasant one.

 

 

 

C++ And STL Exporting Mess

There is a big problem with the C++ STL when exporting  objects that use instantiation of STL objects. One good article that explains the problem well is here. I read about this elsewhere and fought it myself. the bottomline is: There seems to be no solution to this problem. It is an inherent design flaw of the C++ language. Alas the STL uses these templates extensively and when you try to use the STL in a DLL you write yourself, you’re screwed.

Ada uses so called “generics” which are similar to templates, but you don’t have those problems there. The Ada language is way better designed than C++ perhaps because it’s not just made by one crazy Swedish guy. However, Ada suffers from a lack of really good tools. I worked on an Ada project for the past two months and the fight with Linux GDB debugger and the GPS IDE got more and more frustrating. Perhaps if I had more time I would be able to figure out how to use Slickedit with Ada (there must be a way) but I haven’t got the time right now. If there was better tool and IDE support, I am convinced, I would drop C++ for Ada most of the time. As things are, Visual Studio is more comfortable and thus it’s better to accept the flaws in C++ .

Back to the original problem. The only hint I found so far is to avoid DLLs alltogether and use a static library, which of course won’t work when your library has internal status variables and you use that one in several DLLs because then you have multiple instantiations of your status variables. Bleh.

So what I’m trying next for my tools library project is split it up in two parts. One that holds the bare bone essentials with all the status variables and which is a DLL. This will avoid all STL references in the interface. Then another one that contains tool programs using the STL (for instance a FORMAT command for strings similar to sprintf) and which will be a static library.

The problem is of course that on top of all this, I also have to deal with the shitty C4 graphic library which was designed in the spirit that “it is all you ever need” so linking it with any third party library, including the STL, is tricky to say the least.

We see how that goes.

 

USE clause in Ada programs, bad or good?

As a newcomer to Ada, I’m trying to bring my 25 years of experience in other programing languages to the table while learning Ada and to not repeat common mistakes. I also like to form my own opinion on questions that have already been discussed in the Ada community. Sometimes people who are involved too long with a problem don’t see the forest because all the trees are in the way.

So, my problem today is, use a USE clause or not?

In Ada, you “import” or bring a library unit into scope with the WITH clause, so for instance, when you want to use Ada.Text_IO you write:

WITH Ada.Text_IO;

in your code. That enables you to write lines like

Ada.Text_IO.Put_Line( “blahblah”);

to print a line to the console. If you want to avoid having to write the qualifier, then you need to write

WITH Ada.Text_IO;  USE Ada.Text_IO;

and then you can simply write

Put_Line( “blahblah”);

without the qualifier. An Ada package can roughly be compared to a C++ namespace. In fact, I was so taken by Ada Packages that I started to use namespaces in my C++ programs for the first time ever about 2 years ago. My current project is nested about 5 levels deep with namespaces and it really helps to organize C++ classes into problem domains and keep the project clear. That is something I haven’t seen in the C++ world much. Usually, when a third party library is obtained, it uses it’s own namespace, but only one, which then contains all it’s data. Logical separation between domains is done with prefixes mostly. I think deeply nested namespaces are much better there.

I am also aware of the school of thought that says that USE clauses are a bad thing. One should include the domain prefixes whenever using an identifier from a namespace or package, so that it was clear, where the procedure or data element originates. So then, if you want to know where Put_Line() comes from, you don’t have to look any further than to the prefix to see the complete package name. Also, when several packages export the same Identifier, it is clear which one you call. I do that with namespaces too.

They recommend instead renaming a prefix chain so that for instance Ada.Text_IO becomes TIO and you write TIO.Put_Line instead.

Instead of using the C++ keyword

using namespace XXX

I always write

XXX.DoS_something();

Now, VisualStudio supports you greatly there. You see the namespaces in the object browser, pull them into the editor with the mouse, and it copies all namespaces with the Identifier. However, the Ada IDE I’m using, Adacore’s GPS, doesn’t do this. And SlickEdit, which I tried out recently doesn’t do it either. So here you are now left with typing a whole lot of stuff, not to mention you have to remember the exact names you used for packages or namespaces.

However, people always tell you with the same severity they tell you “goto’s are bad” that “USE clauses are bad” for the reasons mentioned above.

 

Well, I disagree with them now.

The reason is, that since I work with Ada and attempt to utilize techniques of Structured Design, I tend to make many more smaller packages or programing units than before. When you have packages that contain only one or two exported subprograms, then there isn’t so much confusion anymore. You import far fewer foreign packages than you would in a long package that has many subprograms from different domains and thus you narrow down the possible sources for an identifier you use in the first place.

Secondly, when a decorated identifier stretches 80 characters in a line alone, you don’t really make things clearer. The human mind can only process so much information at once and a superlong identifier with five package names as prefix is hard to grasp. Together with the capability of modern IDEs to “go to definition” for any identifier, I hardly think there is a need for decorated identifiers anymore. And with unclear polymorphism: You can always decorate a name when multiple packages export the same name.

Finally, renaming a lengthy package name sequence to a short one still means you have to prefix every identifier with it, remembering the names, and if you do that for 10 packages in the same sourcecode, you have to remember 10 abbreviations now instead of 10 package names which you would otherwise know by heart since you use them everywhere. Now however you have a local abbreviation which may be different in different source codes that use the same library units.

So my conclusion and the technique I utilize is:

Use USE for every imported library unit.

It makes a program clearer, more compressed, less cluttered and generally makes your life easier without giving up anything. Use short source files with it, and you will have no disadvantages.

 

 

 

 

To Except Or Not To Except

That’s an age old question in programming. Whether to use exceptions or return values to inform the caller about an error condition. I won’t go into too much detail here. Look at this Dr.Dobbs Article for an example of the discussions or just type stuff like “Exceptions vs Return Values” in a search engine (Preferably Bing, we don’t like Google too much over here, but then we don’t like Microsoft either 🙂 ),

Anyway, the main arguments for exceptions are:

  • Make the code cleaner since no error checking is necessary after each function call.
  • Can be returned from code like C++ constructors that can not return a value otherwise

And probably a lot of other stuff.

Arguments against them mostly center around the fact that they are very costly due to the stack unwinding etc. pp.

 

So there’s a whole bunch of arguments for and against both ways, and all arguments are somewhat relevant, otherwise we would not discuss this like 30 years after the invention of exceptions and it would have been resolved one way or the other long ago. it sometimes reminds me of the old jewish joke where a man comes to a Rabbi and complains about another man and the Rabbi says “You are right”, then comes the man he complained about and complains about the first man and the Rabbi says again “you are right”. But Moshe, says his wife, they can’t possibly both be right. Says the Rabbi to his wife: “You are right too”.

Well, jokes aside, I don’t think there is a way to decide for one or the other and most people prefer a mix of both. Return values for one type of error, Exceptions for other types. But where to draw the line, that is the real question. One advice that I read a lot is this:

Use Return values for error conditions that are expected, like a file not found, and use Exceptions for those that should not occur normally and are exceptional, like Out Of Memory.

That sounds reasonable. Except that it doesn’t solve the most important problem, which is the most valuable argument for Exceptions in my eyes: The cluttering of code. there are probably a lot of error conditions that are to be expected. Let me tell you about my experience. I hardly ever use Exceptions but that is due to my conservative approach. I don’t like new developments except when they are really, really, really advantageous, and Exceptions did not seem that way to me, therefore I continued to use Return Values mostly.

In my experience, most error conditions are to be expected!

When I look over my current game code, there is hardly any error I return where I would say, that really should not happen. And when there is, I use an ASSERT to point it out during debugging and then I somehow either return an error or raise an Exception. Not very clean.

I think we should judge errors instead based on whether the calling code could most likely continue despite of the condition or not.

Let’s look at the previous example again: You want to read say a configuration from a file or a text to be shown in an editor window. Either way, the file does not exist. Or say the file exists, but is corrupt. Now can your code continue under these circumstances? The answer is no. If the file can not be found or if the file can be found but subsequent read commands fail, your code that uses the file can most likely not continue. And perhaps several levels of calling code can not continue either, because often it makes no sense to handle an error where it occurs but at a much higher level. When you have a code that reads a configuration file and you descend like 4 function call levels deep to read the next token and that function detects that the file is corrupt, you don’t want to write a message dialog there. Even if you did, you still have to report the error to the top so that the main code can terminate.

So, yes, Exceptions cost some time in terms of stack unwinding. But does it really matter? A function that your program was supposed to carry out just failed. Probably there are bigger problems than a few microseconds lost due to stack unwinding. And here, Exceptions really shine, because you don’t have to check each level of call for an error that occurred somewhere buried five levels deeper.

So I’m switching to a heavy use of exceptions in my new code. I think Exceptions make code clearer, more readable and more maintainable by moving error handling to one section toward the end of a subprogram or coding section. Unless we do that in a loop that’s supposed to run 5 million times where every microsecond counts and where half of the calls are expected to fail, I see no problem with a little time wasted.

Return values are still valuable, but not so much for “errors” rather than for “alternative outcomes” where the calling code can still continue but in another way. For instance, say the above configuration file read code detects that a certain entry is not to be found in the configuration file, so it supplies a default value instead. That would be an opportunity for a return value.

Always ask yourself, can the code that uses a function continue after a condition occurs or not?

That should be the determining factor for or against an Exception in my eyes.