Human economic decisions are characterized by a number of factors which make them difficult to model with standard mathematical tools. Decisions can be more easily described by a set of rules, and some of them may be rules of thumb. Economic behaviour is adaptive, in that people are able to adjust to a changing environment. It is argued in this paper that the classifier system framework is a suitable means of modeling human economic decisions. A case of a simple economic decision of finding an optimal price is discussed, which is later made more complex by introducing an input variable that effects the optimal price. It is shown that classifier systems can be used in both tasks, and their performance is compared to human decisions in the same set of circumstances.
Keywords: classifier systems, adaptive agents, economic modeling, economic simulation.
There are several reasons why one would want to model the economic decisions made by humans:
Obviously, the human decision making apparatus is very complex. Our brain consists of a vast quantity of neurons and connections between them, and while their basic electro-chemical function is gradually understood, there is probably a long way to an authoritative answer to the question of how people perceive their environment, think about it and arrive at decisions on what to do next. Rather than undertaking the project of simulating such complex thought processes as seem to occur in human beings, should one not confine oneself to study and model the behaviour of much simpler organisms? This bottom-up approach has been advocated [Brooks86a], and the idea has taken roots in the field of artificial life [Steels95], where many projects aim for simulating the behavior of organisms of comparatively low complexity, such as insects (see [Langton89a] and [Meyer/Wilson]).
In this paper, an economic robot and its environment are defined. This artificial creature is then equipped with a guidance system that should enable it to avoid bumping into objects as it travels along: economic variables like price and revenue are perceived through an input device and encoded into a format which the guidance system can process. An output device picks data from the system and lets the robot perform actions in the environment. The guidance system is implemented by a classifier system.
Classifier systems have been introduced by John Holland in 1975 (see [Holland92] and [Holland95]). A typical classifier system essentially works with binary strings and consists of input detector, message list, rule base, output detector (effector), genetic algorithm, and bucket brigade (see Fig 1.) A processing cycle in a simple classifier system works as follows: the input detector filters data from the environment and writes binary strings composed of 0 and 1 onto the message list. Typically this list has a fixed size. The rule base contains rules made of condition and action part. A rule is a string of 0, 1 and # (any). The condition part of a rule matches a given message if the bits at the corresponding positions are equal, or the condition has a # at that position, which means match all; e.g., the message 101 is matched by the condition 1## but not by #11. The system tries to match the condition parts of the rules with the messages on the message list and records possible matches. Rules fire when their condition part is satisfied by a message. Their action part is applied to the message and the result put back on the message list. Application of an action part means: if the action symbol is 0 or 1, then write that symbol; if it is #, then write the symbol of the corresponding position in the message; e.g., applying the action 0## to the message 101 results in the new message 001.
If more than one rule matches a given message, a rule to fire is chosen randomly, where the probability of a given rule to be chosen corresponds to its weight. In the first simulation step, the rule base starts with uniform weights. Rules acquire additional weight by directly causing actions with positive reward from the environment, or by getting paid by other rules: when a rule fires on a message produced previously by another rule, it has to pay a certain (fixed) percentage of its weight to that rule. This bucket brigade ensures that not only those rules that actually produce positive feedback from the environment are strengthened, but also the stage setting rules that allow others to fire by posting the appropriate messages to the message list.
Figure 1: A Holland Classifier System
After the matching and firing, the output detector searches for messages encoding actions, typically identified by a special prefix like 11. If no such messages are present, a new rule is generated and added to the rule base which produces such an action message. Again, if more than one action message is present for a certain type of action, one is chosen randomly according to the weights of the rules that posted them. The message is then decoded and the appropriate action is performed in the environment, such as setting a price.
Figure 2: Recombination of classifiers by the genetic algorithm. The offspring classifier inherits parts from both parents.
In order to effectively search the large space of possible rules, a genetic algorithm is applied to the rule base at certain intervals in the simulation. Rules are selected randomly according to their weight, and a crossover operator similar to biological systems is applied for producing offspring from the parents. A mutation operator is applied with low probability and flips a random bit in the rule. Fig.2 shows the recombination of a new rule from to existing parents: the offspring rule receives parts from each parent. Mutation can (with low probability) flip a bit in the offspring. The idea behind the process is to recombine features which have proven successful in the past (reflected by a high classifier weight).
Obviously, APL is very well suited for implementing the array-processing involved in the operation of a classifier system as described above. Special care has been taken to avoid unneccessary looping in order to get the maximum performance, without sacrificing readability of code. The classifier system described here (see [Geyer-Schulz95] for a different implementation) uses a number of parameters, which are referred to in the code below under the following names:
| ms | message size |
| nm | number of messages in the message list |
| ml | message list, a nm × ms numeric matrix |
| nr | number of rules in the rule base |
| rb | rule base, a nr × (2 · ms) numeric matrix |
| we | the vector of the rule weights |
| prod | a vector of indices of the rules that produced the messages currently on the message list |
The idea in this implementation of the matching process is the distance of symbols: 0, #, and 1 are numerically encoded as 1, 2, and 3, resp. The matching process then essentially is a subtraction: the message 101 of the above example becomes 3 1 3. The condition part 1## becomes 3 2 2. Subtract and take absolute value; if none is greater than 1 then it is a match:
^/1|3 1 3-3 2 2 (1)
1
This can be interpreted as in order to match, the distance of two symbols should not be greater than 1.
With a numeric message and rule encoding, computing the merger r of a message m = 3 1 3 and a rule action part a = 3 2 1 is easy: the instruction is if the i-th rule symbol is a #, take the corresponding message symbol; otherwise, take the rule symbol. In other words,
This can be written as follows, using multiplication and addition to express the condition:
r(m×a=2)+a×a¬2 (3)
r
3 1 1
In the following I will describe the components of the classifier system, and at the end of the section they will be put together to form a working guidance system for a robot in an economic environment.
The input detector binary encodes the environment state s, represented by a single variable (the season, see sections 4 and 5 below). The result of the encoding is a bit vector prefixed with two zeros and written on top of the message list ml.
mlml inputd s
[1] ml[1;]1 1,(1 3)[1+((ms-2)½2)(2*ms-2)×s]
The function match implements the matching procedure described above, along with the appropriate reshaping that glues together all the possible message-rule combinations, in order to compute all matches at once. The result is a boolean table that describes which message is matched by which rule condition.
rml match rb;nr;nm;ms
[1] nr1½rb
[2] nm1½ml
[3] ms¯1½ml³^/1
[4] r³^/1|((nr,nm,ns)½ml)-(nr,nm,ns)½rb[,(¼nr)°+nm½0;¼ms]
The function fire takes the boolean matrix of possible matches from match and chooses rules to fire. It also manages the payments between rules. Special care has to be taken not to try to pay to messages posted by the input detector, since there can be no corresponding rule which posted such a message. Bids and payments are five percent of current rule weight.
zfire b;f;w;s;i;j;bid
[1] w(1÷1++/2=rb[;¼ms])×we×10*6
[2] f+\b×(nm,nr)½w
[3] s?1/f
[4] z1++/^\f³(nr,nm)½s
[5] © pay bid
[6] i(znr)/z
[7] bid0.05×we[i]
[8] we[i]0we[i]-bid
[9] © receive bid
[10] j(znr)/prod
[11] © don't pay to inputs
[12] bid(j¬0)/bid
[13] j(j¬0)/j
[14] we[j]we[j]+bid
Output values for the decision variable are decoded from the message list with the decode operator. Output messages are expected to be prefixed with two 1s. If no such messages are found, a rule is generated and inserted into the rule base that generates such a message, which is than taken for further processing. Otherwise, an output message is selected randomly from those found on the message list, where the probability of an output message being chosen corresponds to the weight of the rule that produced it (the vector prod keeps track of which rules produced the messages currently on the message list; the variable we contains the weights of all rules).
acoutputd;m;r;a;w;p
[1] outpnm½0
[2] b(ml[;1 2]^.=3 3)^prod¬0
[3]
(0=/b)/gen
[4] mi(b/¼nm)[1 dice we[b/prod]]
[5] m,ml[mi;]
[6] genr0
[7] dec:ac((ms-2)½2)3=(2m)
[8] outp[mi]1
[9] rprod[mi]
[10]
0
[11] gen:mi?nm
[12] rb[r?nr;]ml[mi;],a3 3,?(ms-2)½3
[13] rb[r;?ms]2
[14] mml[mi;]ml[mi;]merge a
[15] prod[mi]r
[16] genrr
[17]
(Log<2)/dec
[18] 'Generated rule ',r
[19]
dec
The genetic algorithm searching the rule space is implemented in the function ga. The variable we holds the weights of the rules and is used by the dice function (see below) to choose rules randomly according to those weights. The parents i and j are used in the crossover process: elements of i are copied from start up to the crossover point p into the child k, the remaining rule elements are copied from parent j. Mutation is applied by randomly generating one rule element (0, #, or 1 in its numerical encoding 1, 2, or 3). The position k of the child within the rule base is always chosen randomly according to inverse weight: less successful rules are more likely to be replaced.
rbt ga rb;i;j;k;p;c
[1] p?c¯1½rb
[2]
(0¬2|t)/deal
[3] idice we
[4] jdice we
[5] kdice 1÷we
[6] rb[k;p¼c]rb[i;p¼c]
[7] rb[k;p¼c]rb[j;p¼c]
[8] rb[k;?c]?3
[9] we[k]mean we
[10]
0
[11] deal:ij0
[12] garkdice 1÷we
[13] rb[k;]?c½3
[14] we[k]mean we
Weighted random selection
The function dice chooses y elements of the weight vector supplied as argument. The indices of the elements chosen are returned.
idice x;s;m;n
[1] n½x
[2] s+x
[3] ss×n÷¯1s
[4] m¯1s
[5] i1((¯1+?m)<s)/¼n
The main function uses the components described above to implement the classifier system.
hmain x;t;h;cy;b;f;ac;rew;s
[1] init x
[2] t1
[3] h¼0
[4] next_t:cyCycles
[5] rbt ga orbrb
[6] ml¯1´ml
[7] prod¯1²prod
[8] omlmlml inputd senvenv envs t
[9] cycle:cycy-1
[10] bml match rb
[11] ffire b
[12] mlml merge rb actions f
[13]
((~/ml[;1 2]^.=3 3)^cy>0)/cycle
[14] acoutputd
[15] hh,rewt world ac
[16] prod[(fnr)/¼nm](fnr)/f
[17] wewe updatew rew,outp/prod
[18] log
[19]
(Ttt+1)/next_t
The following situation is analogous to that described previously [Grimm95]; inputs to the decision process are:
| pt-1 | price asked in the previous period |
| qt-1 | quantity produced and sold |
| Ct-1 | total cost of last period |
| Rt-1 | profit or loss incurred |
| Mt | current capital |
| St | current season |
The season St = 1 is held constant in this setting; it has no effect and can be ignored here (but will become important in a later experiment). The only variable for the agent to set is the price pt, and the result of this decision is the reward Rt (profit or loss) corresponding to that price, where quantity qt (produced and sold, see assumptions 1 and 2 below) is a linear function of price:
qt = 85 + 40St + (- 10 - 1.1St)pt
The total cost Ct is a polynomial
function of the quantity qt produced:
Ct = 10 + 2qt +
0.03qt2 +
0.05qtv3
The reward R_t is (but see assumption 3 below):
Rt = pt qt - Ct
At the end of time step t, the new capital Mt+1 is:
Mt+1 = Mt + Rt
The simplifying assumptions are:
While this setting may seem oversimplified at first, note that at the start of the search for the optimal price neither the human subjects nor the artificial agents know about the structure of the model, nor of reasonable bounds for the price.
Up to this point in time, only a few preliminary experiments with human participants have been made, which showed that people had no trouble finding the optimal price in this simple setting. In the following typical log the first line is the user input setting the price, the second and third lines show the response from the simulation environment (see simulation environment under Conclusions below):
p 20 Price set to 20.000000 p 20.00 q 0.00 C 10.00 R -10.00 M 3225.81 S 1.00 p 15 Price set to 15.000000 p 15.00 q 0.00 C 10.00 R -10.00 M 3215.81 S 1.00 p 10 Price set to 10.000000 p 10.00 q 14.00 C 181.08 R -41.08 M 3082.57 S 1.00 p 13 Price set to 13.000000 p 13.00 q 0.00 C 10.00 R -10.00 M 3021.49 S 1.00 p 12 Price set to 12.000000 p 12.00 q 0.00 C 10.00 R -10.00 M 3011.49 S 1.00 p 11 Price set to 11.000000 p 11.00 q 2.90 C 17.27 R 14.63 M 3016.12 S 1.00 p 10 Price set to 10.000000 p 10.00 q 14.00 C 181.08 R -41.08 M 2963.22 S 1.00 p 11 Price set to 11.000000 p 11.00 q 2.90 C 17.27 R 14.63 M 2977.85 S 1.00 p 11.00 q 2.90 C 17.27 R 14.63 M 2992.47 S 1.00
Between 10 and 20 trials were necessary for human players to find a p value that optimizes the return R. In the next section, we will explore how the classifier system performed in this environment.
Figure 3: The optimal Price is fixed in time. The mean reward of ten runs with different random number generator start settings soon enters the positive region.
The number of time steps needed to obtain a reasonable performance may seem large, but with a message length of six, there are approx. 500,000 possible rules. The genetic algorithm has to search this space for appropriate ones, without having an understanding of the problem at hand.
In this experiment, the additional input variable S is not fixed, but changes with time t:
It has to be recognized as an important input variable by the agent: no fixed price will obtain a positive overall outcome over several time steps, since the season changes with a sine function with time, and the demand function and therefore the quantity sold vary with it. A successful agent has to vary its price in a certain range (from 8 to 11), following the slow change of the season variable.
The scenario looks like the following to a human participant in the simulation environment:
p 10 Price set to 10.000000 p 10.00 q 0.00 C 10.00 R -10.00 M 9990.00 S 0.12 p 10.00 q 0.00 C 10.00 R -10.00 M 9980.00 S 0.02 p 10.00 q 0.00 C 10.00 R -10.00 M 9970.00 S 0.00 p 10.00 q 0.00 C 10.00 R -10.00 M 9960.00 S 0.06 p 10.00 q 0.00 C 10.00 R -10.00 M 9950.00 S 0.18 p 10.00 q 0.00 C 10.00 R -10.00 M 9940.00 S 0.36 p 10.00 q 1.19 C 12.51 R -0.61 M 9939.39 S 0.56 p 10.00 q 6.66 C 39.46 R 27.18 M 9966.58 S 0.75 p 10.00 q 11.01 C 102.35 R 7.73 M 9974.31 S 0.90 p 10.00 q 0.00 C 10.00 R -10.00 M 9964.31 S 0.98
The subject started producing by setting a price (first line). The simulation then reports the result of the current settings: price p, quantity (produced and sold) q, total cost C, return R, remaining money M, and the crucial season variable S. Note that when the simulation is running, there is some adjustable time span of 5 or 10 seconds between each report line, with the subject being able to enter new p values anytime. If the subject does not give any input, the old p setting remains in use.
In order to achieve an overall positive return, variation of the price with the change of season is necessary: in the log below a price adjustment was done whenever a drop in return was noticed:
p 8 Price set to 8.000000 p 8.00 q 8.12 C 54.99 R 9.97 M 10035.63 S 0.10 p 8.00 q 0.00 C 10.00 R -10.00 M 10025.63 S 0.25 p 9 Price set to 9.000000 p 9.00 q 8.14 C 55.24 R 18.02 M 10043.66 S 0.44 p 9.00 q 0.00 C 10.00 R -10.00 M 10033.66 S 0.63 p 10 Price set to 10.000000 p 10.00 q 8.54 C 60.34 R 25.01 M 10058.66 S 0.81 p 10.00 q 12.24 C 130.62 R -8.24 M 10050.42 S 0.94 p 11 Price set to 11.000000 p 11.00 q 2.83 C 17.04 R 14.12 M 10064.54 S 1.00
Within one half of the seasonal cycle, the price has to be varied from 8 to 11 and back for the other half of the cycle. With better timing, losses can be avoided altogether. Equations describing near-optimal behaviour are of the form 8+3S.
Only a few informal experiments have been made up to this point in time, which nevertheless showed that this problem was considerably more difficult than the previous one. The subjects took about 100 trials until they unterstood the influence of the season variable. This number will probably be much higher with subjects who had no experience with the simpler situation described above.
Fig.4 shows that the it took the classifier system somewhat longer (cf. fig.3) to achieve positive overall performance. A total of 1000 time steps were simulated; 10 steps form a period, and the reward (again the mean of ten runs) depicted is the average over those steps. The mean reward shows an upward trend, whereas the standard deviation does not change remarkably during the simulation.
Figure 4: The optimal price varies with the season. The mean value of the reward enters the positive region at period 60 (after 600 time-steps)
The experiments performed up to this point suggest that classifier systems are a viable tool for modeling human economic decision making in the situations described. Although the learning speed of the classifier system may seem unsatisfying, one has to take into account that the CS does not have any understanding of the optimization problem, whereas the human subjects were aware of the fact that they engaged in a simple economic simulation, and were able to apply appropriate behaviour, e. g. they correctly expected that the price/profit function in the simple setting had only one peak, and could direct their search correspondingly.
Future Work will center on the design of experiments. The simulation environment is currently being extended to include a variety of economic decision situations; currently, human participants can connect to the simulation environment via telnet. [We invite you to take a look at this virtual world: telnet force.wu-wien.ac.at 4000. The project is documented at http://www.wu-wien.ac.at/usr/ai/mitloehn/pecunia]
This environment was originally developed for text-based adventure games [Curtis/Nichols93]. We are working on a web interface using java applets to provide a more comfortable interaction and attract more players from the internet, who can take part in experiments or the fantasy adventure game that is part of the simulation environment as well. The result of their performance in the economic decision tasks will be rewarded within the adventure game. This way we hope to ensure that decisions are not made lightly, since it is known in experimental economics [Kagel/Roth95] that data from experiments where subjects do not have real incentive for good performance is of no value in modeling real economic behaviour. We expect several benefits from this approach:
In order to connect robots to the simulation, an interface to the virtual world is easily developed in APL, since the communication is based on the exchange of lines of text in a simple format. Classifier systems, along with other adaptive models [Grimm/Mitlöhner95a] can then be used to model the subjects who took part in the simulation games. Using this approach we hope to develop models of adaptive human behaviour that form the base for plausible representations of economies and human decision making.
Johann Mitlöhner
Vienna University of Economics
Dept. of Applied Computer Science
Augasse 2-6
A-1090 Vienna
Austria
[Brooks86a] Rodney A. Brooks. A robust layered control system for a mobile robot. IEEE J. Robotics and Automation, RA-2:14--23, April 1986.
[Curtis/Nichols93] Pavel Curtis and David A. Nichols. Muds grow up: Social virtual reality in the real world. Published on the Internet as file MUDsGrowUp.txt, available e.g. at http://www.wu-wien.ac.at/usr/ai/mitloehn, January 1993.
[Geyer-Schulz95] Andreas Geyer-Schulz. Holland classifier systems. APL Quote Quad, 25(4), June 1995. APL'95 Conference Proceedings.
[Grimm/Mitlöhner95a] T. Grimm and J. Mitlöhner. Bounded rationality and adaptive agents in economic modeling. APL Quote Quad, 25(4), June 1995.
[Grimm95] Thomas Grimm and Johann Mitlöhner. Bounded rationality and adaptive agents in economic modeling.em APL Quote Quad}, 25(4), June 1995. APL'95 Conference Proceedings.
[Holland92] John H. Holland. Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, Massachusetts, 1992.
[Holland95] John H. Holland. Hidden Order. Addison-Wesley, Reading, Massachusetts, 1995.
[Kagel/Roth95] John H. Kagel and Alvin E. Roth. The Handbook of Experimental Economics. Princeton University Press, Princeton, New Jersey, 1995.
[Langton89a] Christopher G. Langton, editor. Artificial Life. Santa Fe Institute, Addison-Wesley, 1989.
[Meyer/Wilson] Jean-Arcady Meyer and Stewart W. Wilson, editors. From Animals to Animats - Proceedings of the First International Conference on Simulation of Adaptive Behaviour. MIT Press, Cambridge, Massachusetts, 1991.
[Steels95] Luc Steels, editor. The artificial life route to artificial intelligence : building embodied, situated agents. Erlbaum, Hillsdale, 1995.
[Sterman89a] John D. Sterman. Modeling managerial behaviour: misperceptions of feedback in a dynamic decision making environment. Management Science, 35(3), March 1989.