Evaluating arithmetic expressions

Or computing the result of 1+2*3-(7*7*9)+1-(2*(2+5-1)) to -445, without using "eval".

In languages missing evaluation capabilities, evaluating arbitrary expressions is not necessarily straight forward, because expressions written in infix form can't be intuitively parsed or evaluated. The quickest solution is probably embedding a scripting engine at runtime and using the scripting engine to evaluate the expression but this is not always feasible.

This post describes an alternate solution, that is parsing the expression into Reverse Polish Notation and evaluating the RPN form, one operator at a time.


Our infix expression parser requires three steps:
- parsing the expression into tokens.
- infix to RPN (reverse polish notation) (we'll do this using the Shunting-yard algorithm).
- evaluate the RPN expression.

We first define the token types our expression will be parsed into.
There's a limited set of operators used in our example: addition, subtraction, multiplication and division. We'll also support parenthesis for defining groups.


------------------------- defs.h ------------------------
typedef enum {
ENDFILE, ERROR,
LPAREN, RPAREN,
OPPLUS, OPMINUS,
OPMUL, OPDIV,
CONSTNUMBER} TokenType;


We'll parse the expression using (f)lex so we need to define our tokens for lex to parse:

-------------------------- infix.lex ----------------------
%{
#import <Cocoa/Cocoa.h>
#import "defs.h"
#import "Token.h"

%}

digit [0-9]
number {digit}+

whitespace [ \t]+
newline \n
%%
{number} {return CONSTNUMBER;}
"+" {return OPPLUS;}
"-" {return OPMINUS;}
"*" {return OPMUL;}
"/" {return OPDIV;}
"(" {return LPAREN;}
")" {return RPAREN;}
{whitespace} {/* skip whitespace */}
{newline} {return ENDFILE;}
. {return ERROR;}
%%
Token * getToken()
{
Token * t = [[[Token alloc] init] autorelease];
[t setType:yylex()];
[t setValue:[NSString stringWithCString:yytext encoding:NSASCIIStringEncoding]];

return t;
}


In the lex definition file, we define the token types based on the regular expressions they match. The getToken function will be repeatedly called to create and consume tokens. The token itself has two attributes, the token type (as returned by yylex()) and the corresponding token value (yytext).

We'll use a very simple ObjC class to store and manipulate our tokens.



------------------------- Token.h ------------------------
#import <Cocoa/Cocoa.h>
#import "defs.h"

@interface Token: NSObject
{
TokenType type;
NSString * value;
}
@property TokenType type;
@property (retain) NSString * value;
- (BOOL) isOperator;
- (int) precendence;

+ (Token *) eval:(Token *)larg :(Token *)op :(Token *)rarg;




------------------------- Token.m ------------------------
#import "Token.h"

@implementation Token

@synthesize type;
@synthesize value;

- (BOOL) isOperator
{
if(type == OPPLUS || type == OPMUL || type == OPDIV || type == OPMINUS)
return YES;
return NO;
}
- (int) precendence
{
if(type == OPPLUS || type == OPMINUS)
return 10;
if(type == OPMUL || type == OPDIV)
return 100;
return 0;
}

- (NSString *) description
{
return value;
}

+ (Token *) eval:(Token *)larg :(Token *)op :(Token *)rarg
{
NSInteger lval = [[larg value] integerValue];
NSInteger rval = [[rarg value] integerValue];

Token * r = [[[Token alloc] init] autorelease];
[r setType:CONSTNUMBER];

int res;

if([op type] == OPMUL)
res = lval * rval;
else if([op type] == OPDIV)
res = lval / rval;
else if([op type] == OPPLUS)
res = lval + rval;
else if([op type] == OPMINUS)
res = lval - rval;

NSString * s = [NSString stringWithFormat:@"%d", res];
[r setValue:s];
return r;
}



The Evaluate class defines most of the logic. It consumes tokens following the Shunting-yard algorithm.


------------------------- Evaluate.m ------------------------
#import "Evaluate.h"
#import "defs.h"
#import "Token.h"
@implementation Evaluate

extern Token * getToken();

-(void) run
{

NSMutableArray * outputQueue = [NSMutableArray arrayWithCapacity:10];
NSMutableArray * operatorStack = [NSMutableArray arrayWithCapacity:10];

Token * t = getToken();
while([t type] != ENDFILE)
{
if([t type] == ERROR)
{
NSLog(@"Error parsing token: %@, halting", [t value]);
break;
}
else if([t type] == CONSTNUMBER)
{
NSLog(@"Adding constant: %@", [t value]);
[outputQueue addObject:t];
}
else if([t isOperator])
{
NSLog(@"Adding operator: %@", [t value]);
while([operatorStack count] && [[operatorStack objectAtIndex:0] isOperator])
{
Token * topOfStack = [operatorStack objectAtIndex:0];
if([t precendence] <= [topOfStack precendence])
{
NSLog(@"Popping operator %@ off the stack", topOfStack);
[outputQueue addObject:topOfStack];
[operatorStack removeObjectAtIndex:0];
}
else
{
NSLog(@"Not popping off the stack, top is %@, current op is %@", topOfStack, t);
break;
}
}
[operatorStack insertObject:t atIndex:0];
}
else if([t type] == LPAREN)
{
NSLog(@"Adding operator: %@", t);
[operatorStack insertObject:t atIndex:0];
}
else if([t type] == RPAREN)
{
NSLog(@"Adding operator: %@", [t value]);
while([operatorStack count] && [(Token *)[operatorStack objectAtIndex:0] type] != LPAREN)
{
NSLog(@"Popping LPAREN off the stack");
Token * topOfStack = [operatorStack objectAtIndex:0];
[outputQueue addObject:topOfStack];
[operatorStack removeObjectAtIndex:0];
}
if([operatorStack count] && [(Token *)[operatorStack objectAtIndex:0] type] == LPAREN)
[operatorStack removeObjectAtIndex:0];
else
NSLog(@"Missing LPAREN");

}

t = getToken();
}

if([operatorStack count] && ([(Token *)[operatorStack objectAtIndex:0] type] == LPAREN ||
[(Token *)[operatorStack objectAtIndex:0] type] == RPAREN))
NSLog(@"Mismatched paren");

int i=0;
for(i=0;i<[operatorStack count];i++)
{
if([(Token *)[operatorStack objectAtIndex:i] type] == LPAREN ||
[(Token *)[operatorStack objectAtIndex:i] type] == RPAREN)
{
NSLog(@"ERROR: Non operator on stack");
return;
}
[outputQueue addObject:[operatorStack objectAtIndex:i]];
}

NSLog(@"Output queue: %@", outputQueue);

NSLog(@"Trying to eval expression from queue");
NSMutableArray * rpStack = [NSMutableArray arrayWithCapacity:10];
for(i=0;i<[outputQueue count];i++)
{
Token * ct = (Token *)[outputQueue objectAtIndex:i];
if([ct type] == CONSTNUMBER)
{
NSLog(@"Consuming numeric token: %@", ct);
[rpStack addObject:ct];
}
if([ct isOperator])
{
NSLog(@"Consuming operator token: %@", ct);
/* pop two args off of the stack*/
if([rpStack count] < 2)
{
NSLog(@"Error: Operator %@ is missing arguments", t);
return;
}
Token * t1 = [rpStack objectAtIndex:[rpStack count]-1];
[rpStack removeLastObject];
Token * t2 = [rpStack objectAtIndex:[rpStack count]-1];
[rpStack removeLastObject];

Token * eval = [Token eval:t2 :ct :t1];
NSLog(@"Eval: %@ %@ %@ result: %@", t2, ct, t1, eval);
[rpStack addObject:eval];
}
}

if([rpStack count] != 1)
{
NSLog(@"ERROR: too many values");
return;
}

NSLog(@"RP stack: %@", rpStack);

}
@end