Coherence of Data and Code. Is it a mistake?

Just a brief reflection on something that has puzzled me for quite a while now.

I learned programming the old way, Basic, Assembler, Pascal, Modula-2, C and then C++ that’s my path. Except for C++ (with which I have been stuck for 20 years or so) the principle of keeping data and code together wasn’t really a topic. In fact, there was little that forced you to keep them together until C++ became popular.

Today, it’s a religious dogma. You must keep the data and the code together in a class. Don’t make your data elements public, then anybody can access them! Implement accessor functions instead. Nobody doubt’s this these days, but I have run into situations where this dogma is literally holding me back. Let me explain.

Making client server games the old fashioned way (meaning with a dedicated client application, not a web browser) provides you with the following challenge: Oftentimes you have to define data that is to be used on the client and the server. But of course, the code won’t fit. the code that access the data on the server often has references to other objects or code that only exists on the server and therefore can not be compiled or linked with the client, and vice versa.

So in my previous game I used the #ifdef statement extensively to make those classes compile one way on the server and another way on the client. This is of course a terrible crutch.

In my current game I came up with the solution to have a shared link library that holds the data. Then I add classes on the client and the server which have this data structure as a private member and then provide the separate access and computation routines for the client and the server.

But this simply tells me that the dogmatic “unity of code and data” is nothing that is natural. I am sure there are other example where code and data should be separate.

Currently there is no good solution, but I just want to submit as a suggestion that the religious view that code and data are inseparable may be false.

How to Receive Sourcecode Location From Flex/Bison

I finally found a page that contained working code that returns the lexer position in the source code in case of an error. It was damn hard to find any good information, mostly because Flex and Bison are two tools that are typical for C programs of 1970: shitty.

I copy the page here in case the original page goes down



 

Advanced Use of Flex

In this section we will develop a scanner for arithmetics, which will later be used together with a Bison generated parser to implement an alternative implementation of M4’s eval builtin, see Bison, ylparse.y (FIXME: Ref Bison, ylparse.y.). Our project is composed of:

yleval.h a header common to all the files,
ylscan.l the scanner for arithmetics
ylparse.y the parser for arithmetics (FIXME: ref.).
yleval.c the driver for the whole module (FIXME: ref.).

Because locations are extremely important in error messages, we will look for absolute preciseness: we will not only track the line and column where a token starts, but also where it ends. Maintaining them by hand is tedious and error prone, so we will insert actions at appropriate places for Flex to maintain them for us. We will rely on Bison’s notion of location:

typedef struct yyltype
{
  int first_line, first_column, last_line, last_column;
} yyltype;

which we will handle thanks to the following macros:

LOCATION_RESET (location) Macro
Initialize the location: first and last cursor are set to the first line, first column.
LOCATION_LINE (location, num) Macro
Advance the end cursor of num lines, and of course reset its column. A macro LOCATION_COLUMN is less needed, since it would consist simply in increasing the last_column member.
LOCATION_STEP (location) Macro
Move the start cursor to the end cursor. This is used when we read a new token. For instance, denoting the start cursor S and the end cursor E, we move from

      1000 + 1000
      ^  ^
      S  E

to

      1000 + 1000
         ^
        S=E
LOCATION_PRINT (file, location) Macro
Output a human readable representation of the location to the stream file. This hairy macro aims at providing simple locations by factoring common parts: if the start and end cursors are on two different lines, it produces 1.1-2.3; otherwise if the location is wider than a single character it produces 1.1-3, and finally, if the location designates a single character, it results in 1.1.

Their code is part of yleval.h:

/* Initialize LOC. */
# define LOCATION_RESET(Loc)                  \
  (Loc).first_column = (Loc).first_line = 1;  \
  (Loc).last_column =  (Loc).last_line = 1;

/* Advance of NUM lines. */
# define LOCATION_LINES(Loc, Num)             \
  (Loc).last_column = 1;                      \
  (Loc).last_line += Num;

/* Restart: move the first cursor to the last position. */
# define LOCATION_STEP(Loc)                   \
  (Loc).first_column = (Loc).last_column;     \
  (Loc).first_line = (Loc).last_line;

/* Output LOC on the stream OUT. */
# define LOCATION_PRINT(Out, Loc)                               \
  if ((Loc).first_line != (Loc).last_line)                      \
    fprintf (Out, "%d.%d-%d.%d",                                \
             (Loc).first_line, (Loc).first_column,              \
             (Loc).last_line, (Loc).last_column - 1);           \
  else if ((Loc).first_column < (Loc).last_column - 1)          \
    fprintf (Out, "%d.%d-%d", (Loc).first_line,                 \
             (Loc).first_column, (Loc).last_column - 1);        \
  else                                                          \
    fprintf (Out, "%d.%d", (Loc).first_line, (Loc).first_column)

Example 6.14: yleval.h (i) -- Handling Locations

 

Because we want to remain in the yleval_ name space, we will use %option prefix, but this will also rename the output file. Because we use Automake which expects flex to behave like Lex, we use %option outfile to restore the Lex behavior.

%option debug nodefault noyywrap nounput
%option prefix="yleval_" outfile="lex.yy.c"

%{
#if HAVE_CONFIG_H
#  include <config.h>
#endif
#include <m4module.h>
#include "yleval.h"
#include "ylparse.h"
Example 6.15: ylscan.l -- Scanning Arithmetics

 

Our strategy to track locations is simple, see Flex Actions. Each time yylex is invoked, we move the first cursor to the last position thanks to the user-yylex-prologue. Each time a rule is matched, we advance the ending cursor of yyleng characters, except for the rule matching a new line. This is performed thanks to YY_USER_ACTION. Each time we read insignificant characters, such as white spaces, we also move the first cursor to the latest position. This is done in the regular actions:

/* Each time we match a string, move the end cursor to its end. */
#define YY_USER_ACTION  yylloc->last_column += yyleng;
%}
%%
%{
  /* At each yylex invocation, mark the current position as the
     start of the next token.  */
  LOCATION_STEP (*yylloc);
%}
  /*  Skip the blanks, i.e., let the first cursor pass over them.  */
[\t ]+     LOCATION_STEP (*yylloc);
\n+        LOCATION_LINES (*yylloc, yyleng); LOCATION_STEP (*yylloc);

The case of the keywords is straightforward and boring:

"+"        return PLUS;
"-"        return MINUS;
"*"        return TIMES;
...

Integers are more interesting: we use strtol to convert a string of digits into an integer. The result is stored into the member number of the variable yylval, provided by Bison via ylparse.h. We support four syntaxes: 10 is decimal (equal to… 10), 0b10 is binary (2), 010 is octal (8), and 0x10 is hexadecimal (16). Notice the risk of reading 010 as a decimal number with the naive pattern [0-9]+; you can either improve the regular expression, or rely on the order of the rules1. We chose the latter.

  /* Binary numbers. */
0b[01]+   yylval->number = strtol (yytext + 2, NULL, 2); return NUMBER;
  /* Octal numbers. */
0[0-7]+   yylval->number = strtol (yytext + 1, NULL, 8); return NUMBER;
  /* Decimal numbers. */
[0-9]+    yylval->number = strtol (yytext, NULL, 10); return NUMBER;
  /* Hexadecimal numbers. */
0x[:xdigit:]+ yylval->number = strtol (yytext + 2, NULL, 16); return NUMBER;

Finally, we include a catch-all rule for invalid characters: report an error but do not return any token. In other words, invalid characters are neutralized by the scanner:

  /* Catch all the alien characters. */
.   {
      yleval_error (yycontrol, yylloc, "invalid character: %c", *yytext);
      LOCATION_STEP(*yylloc);
    }
%%

where yleval_error is a variadic function (as is fprintf) and yycontrol a variable that will be both defined later.

This scanner is complete, it merely lacks its partner: the parser. But this is yet another chapter…

The Compiler Builder Toolkit

There is an Open Source project out there that aspires to replace GCC as the predominant Open Source compiler system. And while that is still along way down the road, their C++ compiler Clang is already very good and the whole toolkit is used extensively i Apple system software.

What’s good about this is that the Intermediate Represenatino (IR) a sort of virtual RISC processor, is well documented and allows even people with little experience to build their own compiler. Although it will probably not be competing with Visual C++ anytime soon, it’s a good way to learn about compilers.

If you’re into this sort of stuff, check it out.

 

C++ And STL Exporting Mess

There is a big problem with the C++ STL when exporting¬† objects that use instantiation of STL objects. One good article that explains the problem well is here. I read about this elsewhere and fought it myself. the bottomline is: There seems to be no solution to this problem. It is an inherent design flaw of the C++ language. Alas the STL uses these templates extensively and when you try to use the STL in a DLL you write yourself, you’re screwed.

Ada uses so called “generics” which are similar to templates, but you don’t have those problems there. The Ada language is way better designed than C++ perhaps because it’s not just made by one crazy Swedish guy. However, Ada suffers from a lack of really good tools. I worked on an Ada project for the past two months and the fight with Linux GDB debugger and the GPS IDE got more and more frustrating. Perhaps if I had more time I would be able to figure out how to use Slickedit with Ada (there must be a way) but I haven’t got the time right now. If there was better tool and IDE support, I am convinced, I would drop C++ for Ada most of the time. As things are, Visual Studio is more comfortable and thus it’s better to accept the flaws in C++ .

Back to the original problem. The only hint I found so far is to avoid DLLs alltogether and use a static library, which of course won’t work when your library has internal status variables and you use that one in several DLLs because then you have multiple instantiations of your status variables. Bleh.

So what I’m trying next for my tools library project is split it up in two parts. One that holds the bare bone essentials with all the status variables and which is a DLL. This will avoid all STL references in the interface. Then another one that contains tool programs using the STL (for instance a FORMAT command for strings similar to sprintf) and which will be a static library.

The problem is of course that on top of all this, I also have to deal with the shitty C4 graphic library which was designed in the spirit that “it is all you ever need” so linking it with any third party library, including the STL, is tricky to say the least.

We see how that goes.