Mind, Part 5: Less Mind & Mindlessness

A) Just a Rat in a Cage

As discussed last time, there are problems with algorithmic programs that fail in replicating the mind. One prominent defect is the need to factor in changes that go beyond programming, viz. learning. If we take for granted that the brain and mental experience have plasticity, then we need to account for an ability to learn. For programs there are theories of machine learning, and for organisms there are theories of behaviorism. The latter still exists buried deep in universities, and the former is currently en vogue. Both have ardent defenders, and so a discussion needs to be had.

For behaviorism, the intent is to be as empirical as possible, constructing theories such that controlling variables are completely definable and measurable. This allows learning theory to boil down to a few terms that can then be manipulated. There are stimuli that impinge upon the sensory organs that can be linked with reinforced respondents, viz. an elicited reflexive response, or operants, viz. a response emitted without an obvious stimulus. By reinforcement, one means that the rate of response is strengthened, viz. a longer time is needed to extinguish the reinforced behavior back to a pre-conditioned state. After reinforcing behavior through successive approximations, the end result should be learning, or a capacity to adapt one’s behavior in a new environment as a result of prior experience. And if this learning response demonstrates smooth and reproducible curves, then the coupling of stimuli and response is deemed to become the new determined pattern as per the law of effect.

If one takes a firm hold of this law, then once again one proceeds by monitoring only input and output states. This is satisfactory for those who see algorithmic programming as problematic but still desire deterministic mechanisms in their theories. Those in the field can concern themselves only with input and output states, for the sake of parsimony as per Morgan’s Cannon. It also allows bypassing of programming that sets initial conditions. This is accomplished by starting from the environment and letting that milieu create new needs and actions. In effect, this means learning can be as wide as the environment, instead of merely what was initially programmed.

This emphasis on environmental stimuli and lawful responses also means that with proper operational definitions and sufficient data, intentionality can be reduced to a behavioral disposition. As put by C.L. Hull, “pure stimulus acts are the physical substance of ideas”. Thus, a behavior such as sex can be described in terms of stimuli and response, without need to posit decisive intervening states such as libido. Inner mental states may exist, but they are as malleable and predictable as the outer stimuli and responses. To put it another way, in the words of B.F. Skinner: “it would be foolish to deny the existence of that private world, but it is also foolish to assert that because it is private it is of a different nature from the world outside.” 

Thus behaviors that used to be explained in terms of S-O-R (stimulus-organism-response) are now described in terms of S-R (stimulus-response). And so we have yet again another theory of mind seeking to explain mental states by making them ineffective and fictitious.

“[I’m] struck by how often in 20th-century behavioral science methodological nuttiness got in the way. Why…did psychology feel compelled to embark on its investigations by tying one hand behind its back and using the other to shoot itself in the foot? Didn’t problems about the mind seem hard enough to bear without adding a freight of procedural inhibitions?” – Jerry Fodor
All these premises have been expanded upon in machine learning. For someone like Daniel Dennett, “the brain is certainly not a digital computer running binary code, but it is still a kind of computer”. This is because while traditional computer programming operates on preformed data and algorithms to produce an output, machine learning involves looking at data and outputs and produces an algorithm. In the words of Pedro Domingos: “We can think of machine learning as the inverse of programming, in the same way that the square root is the inverse of the square, or integration is the inverse of differentiation.” This inversion can be accomplished in different ways, with Domingos dividing the current methods of machine learning into five “tribes”: Symbolists, Bayesians, Analogizers, Evolutionaries, and Connectionists. As summarized by Dennett: “except for the creations of the Symbolists, [the tribes] are bottom-up, needle-in-haystack-finding repetitive churnings”. As we have previously discussed GOFAI and their descendants the Symbolists, we need to now consider the remaining “tribes” under various aspects of learning theory as they work through their “repetitive churnings.”

Importantly, organizing a system on learning principles may help in certain ways. Computationally hard, but solvable, NP problems are being addressed by the brute force of machine learners. And learning seems to be part of human mental functions, whether in rewired areas of the brain following commissurotomy or a stroke or OCD treatment. So in cases of learning, surely, there is a reasonableness to positing rewards as determining motivation. How far can we take that reasoning?

B) Empty Terms for Empty Tabulas

Problems begin to emerge when the only motivation considered is rewards. Worse still if all motivations are reduced to rewards or the opposite. Then this becomes empty phrasing and begets empty theorizing upon our tabula rasa subjects.

As part of following their preferred methodology, behaviorists want to reduce acts down to observable terms alone. This is done in out of a fear of creating a homunculus out of mental acts. Dennett puts it thus: “No satisfactory psychological theory can rest on any use of intentional idioms, for their use presupposes rationality, which is the very thing psychology is supposed to explain. So if there is progress in psychology, it will inevitably be, as Skinner suggests, in the direction of eliminating ultimate appeals to beliefs, desires, and other intentional items from our explanations.” Skinner himself thinks the homunculus cannot be incorporated into a theory, and so it must be removed: “Nor can we escape…by breaking the little man into pieces and dealing with his wishes, cognitions, motives, and so on, bit by bit. The objection is not that those things are mental but that they offer no real explanation and stand in the way of a more effective analysis.”

“[Behaviorism is] a ridiculous conception of the mind: the idea that there’s nothing going on in there, except you have the stimulus input and behavioral output.” – John Searle
For methodological reasons, this may be permissible for reasons similar to those given by functionalist reasoning when explaining the Turing test. However, similar reasoning yields similar problems. If we embrace the behaviorist methodology and rely heavily on the terms of theory, then serious problems emerge with the terms.

For one thing, the terms could be circular and lose coherency even before they attempt to correspond to reality. As put by Charles Taylor:

“The problem of what is to be considered a reinforcer is a vexed one. Some writers have complained that the use of this concept involves one in a circle: a reinforcer is a state of affairs which strengthens a connection, while a connection is strengthened by a reinforcing state of affairs.”

Thus the explicandum is made equal to the explicans. Behaviorists will counter that the reasoning may be rationally incoherent but it remains empirical as it is determined by observations of probabilities and frequencies of response. This faces another problem of circularity however, as noted by Noam Chomsky:

“Frequency is a very misleading measure of strength, since, for example, the frequency of a response may be ‘primarily attributable to the frequency of occurrence of controlling variables’. It is not clear how the frequency of a response can be attributable to anything but the frequency of occurrence of its controlling variables if we accept Skinner’s view that the behavior occurring in a given situation is ‘fully determined” by the relevant controlling variables’.”

Even the most traditionally presumed examples providing support for learning theory, such as memory, may be unable to escape this circularity:

“The appeal to memory presupposes what it is meant to explain, namely, the articulation of the givens, the imposing of a sense onto the sensible chaos. The evocation of memory becomes superfluous the moment that it is made possible, since the work that we expect from it has thus already been accomplished.” – Maurice Merleau-Ponty

In the abstract this is to say that an organism remembering the past and acting accordingly in the present is quite difficult, if not impossible, to separate from an organism simply responding appropriately in the now. In a concrete example:

“We have no principle of counting for types of families of [behavioral] sets. When an animal goes forward and then two yards to the right, is there a different set operating from the case where he goes five yards to the right? And if so, where is the boundary between these two?” – Charles Taylor

In another way, if clarified, the terms may simply not correspond with reality. If we use Dennett’s heterophenomenology, we are committed to privileging a third person objective account over one’s immediate experience. This becomes, literally, a bad joke in the following:

After two behaviorists have sex, one asks the other: “It was good for you. How was it for me?”

Or, the terms could be made true, but trivially so. This would mean they avoid circularity but they also avoid usefulness to the field:

“Consider…the basic principle that Skinner calls the ‘law of conditioning’. It reads: ‘if the occurrence of an operant is followed by presence of a reinforcing stimulus, the strength is increased’. As reinforcement was defined, this law becomes a tautology. For Skinner, learning is just change in response strength.” – Noam Chomsky

For a term to be true and useful in a field of inquiry, it needs to demonstrate falsifiability. This is clearly not the case for many verbal responses, as pointed out by Chomsky:

“The looseness of the term reinforcement…makes it entirely pointless to inquire into the truth or falsity of this claim. We can reinforce someone by emitting verbal behavior as such (since this rules out a class of aversive stimulations), by not emitting verbal behavior (keeping silent and paying attention), or by acting appropriately on some future occasion. From this sample, it can be seen that the notion of reinforcement has totally lost whatever objective meaning it may ever have had. Running through these examples, we see that a person can be reinforced though he emits no response at all, and that the reinforcing stimulus need not impinge on the reinforced person or need not even exist (it is sufficient that it be imagined or hoped for).”

Those opposing behaviorism would argue that the functionality, however limited, of these terms is due to their parasitic derivative nature from common folk psychology parlance:

“[Skinner’s] extrapolation of the notion of probability can best be interpreted as, in effect, nothing more than a decision to use the word PROBABILITY, with its favorable connotations of objectivity, as a cover term to paraphrase such low-status words as INTEREST, INTENTION, BELIEF, and the like…The phrase ‘X is reinforced by Y’ is being used as a cover term for ‘X wants Y,’ ‘X likes Y,’ ‘X wishes that Y were the case,’ etc.” – Noam Chomsky

This is because the behaviorist looks only at objective causal mechanisms, refusing to grant mental states any causative power. To work from a common notion of actions would imply mental states have causative powers, aiming towards notions such as goals:

“We are simply adopting the ordinary language notion of action. But in doing this we are not only introducing a teleological element, but we are abandoning the claim to explain learning in terms of conditioning” – Charles Taylor

If that’s the case, then why stick to this methodology? Why not simply consider this a bad approach and let it die?

“Behaviorism really is dead. Even fancy, sophisticated, philosophical behaviorism really is dead. And the kind of behaviorism that seeks to impose epistemic constraints on the ontology of psychological theories is especially dead.” – Jerry Fodor

C) Ring My Bell

The starting point for behaviorism is “classical” conditioning and the story of Pavlov’s dog. This behavior was limited to “reflexes” initially before the experimental paradigm expanded. There is still use of classical conditioning in terms of reflexes and pathologies, such as psychoneuroimmunology and addiction cues. These are laudable findings.

“Philosophers turned a methodological program into a metaphysical lunacy. They started to deny the very existence of consciousness. [Behaviorism] is the craziest thing that has ever been said in the whole history of philosophy.” – Galen Strawson
These principles can also be mimicked by Bayesian algorithms. These algorithms find statistical regularities, perform operations on them when a sufficient threshold value is reached, and rank those operations in a hierarchal network of operations to perform. Dennett goes so far as to say that “the brain can be a Bayesian expectation-generator without the affordances thereby tracked and exploited being owned.” On the basis of affordances such as expectations and probabilities, and methods such as MCMC, a Bayesian machine can learn to take action based on certain probabilities. This is fine for a behaviorist who theorizes that probabilities, rather than beliefs, alter output. As put by Skinner: “Beliefs, preferences, perceptions, needs, purposes, and opinions are possessions of autonomous man which are said to change when we change minds. What is changed in each case is a probability of action.”

If past behaviors are supposed to indicate future performance, and there exists a gradient of reinforcement, then why don’t assertions work by creating stronger intensity stimuli? For instance, take the following example:

“If we take this suggestion quite literally, the degree of confirmation of a scientific assertion can be measured as a simple function of the loudness, pitch, and frequency with which it is proclaimed, and a general procedure for increasing its degree of confirmation would be, for instance, to train machine guns on large crowds of people who have been instructed to shout it.” – Noam Chomsky

But this does not correspond to reality. Instead, the appropriate replies available in the face of a given stimulus are both varied and unrelated to response strength:

“For [Skinner], ‘if we are shown a prized work of art and exclaim BEAUTIFUL!, the speed and energy of the response will not be lost on the owner.’ It does not appear totally obvious that in this case the way to impress the owner is to shriek BEAUTIFUL in a loud, high-pitched voice, repeatedly, and with no delay (high response strength). It may be equally effective to look at the picture silently (long delay) and then to murmur BEAUTIFUL in a soft, low-pitched voice (by definition, very low response strength).” – Noam Chomsky

And if all is conditioned to past, what about past experiences allows an organism to yield new behaviors? Or responding with improvisation instead of replication?

“Either the animal is able to take a short-cut in any environment of a given shape where he had developed an expectancy chain, so that he could run an indirect path to reward, or that he should be able to do so in none of this shape. His ability should depend purely on the spatial structure of the path and short-cut. For instance, there should be no difference between the animal’s improvisation performance in a closed maze and in an open one respectively of the same shape. And this is not borne out by the evidence.” – Charles Taylor

There should also be a lack of planning.  This puzzled Skinner himself when he said: “The future doesn’t exist. How can it affect contemporary human behavior?” After all, future and remote stimuli should make no difference in conditioning, for responding to an object that is not present should have no place in a S-R theory. Yet anticipation and future planning happens all the time:

“Suppose, for example, that while crossing the street I hear someone shout WATCH OUT FOR THE CAR and I jump out of the way. It can hardly be proposed that my jumping was conditioned (that is, I was trained to jump) precisely in order to reinforce the behavior of the speaker.” – Noam Chomsky

The prominent functions of a mind include the ability to register events from the past through remembering, thinking of events in the future with anticipation, and contemplating events that are merely possible. As such, this inability of a theory of mind to handle future events is extremely peculiar.

Similar to this, S-R theories lack an explanation for novel behaviors:

“Suppose I am held up and asked for my wallet. This has never happened to me before, so the correct response cannot have been reinforced for me, yet I do the smart thing: I hand over my wallet. Why? The Skinnerian must claim that this is not truly novel behavior at all, but an instance of a general sort of behavior which has been previously conditioned. But what sort is it? Not only have I not been trained to hand over my wallet to men with guns, I have not been trained to empty my pockets for women with bombs, nor to turn over my possessions to armed entities. None of these things has ever happened to me before. I may have never been threatened before at all. Or more plausibly, it may well be that most often when I have been threatened in the past, the reinforced response was to apologize to the threatener for something I’d said. Obviously, though, when told YOUR MONEY OR YOUR LIFE, I don’t respond by saying I’M SORRY, I TAKE IT ALL BACK. It is perfectly clear that what experience has taught me is that if I want to save my skin, and believe I am being threatened, I should do what I believe my threatener wants me to do.” – Daniel Dennett

To return to the beginning, even in the restricted conditions surrounding Pavlovian reflexes, there are conditions that require the postulation of an S-O-R. Certain cognitive processes, eg. corticolimbic incentive processes, are not mutable via Pavolvian conditioning on the mesolimbic dopamine systems. And those behaviors that can be conditioned are susceptible to an extinction burst where a reflex increases paradoxically during extinction, or spontaneous recovery where an extinguished response recurs without re-learning. And starting from the cephalic phase of digestion, the mere thought of food is sufficient to activate vagal nuclei and G cells. These effects seem to be better explained with inclusion of mental states under S-O-R descriptions and lack explanation under simpler S-R conditions.

Considering these cases, it may once again be worthwhile to make human minds more complicated. An easy way to do this is by going from S-R to S-O-R, but I digress…

Screenshot (1325)
“[Behaviorists were] stupid enough to think you could build up a whole theory and system of logic about human psychology based entirely on learning and specifically the kind of stimulus-response learning that’s studied in the lab.” – Robert Trivers
For one thing, to account for alterations in behavior, it would seem that directedness needs to be added to one’s theory. A behaviorist would argue that this need not be intentionality, for direction could be sufficiently provided with a manque map. In other words, if the internal knowledge of the environment grows, if a better map is outlined, then this could constitute learning. This would account for remote objects that are not impinging on receptors. Maybe new “goals” and an “intentional environment” can simply be Bayesian expectations, arranged hierarchically, that would be sufficient to replicate S-O-R behavior.

This would prima facie appear to avoid difficulties in assuming only past historical exposure. However, it runs into those exact same problems with frequencies and likelihoods of success based on past results because it requires PRIORS for posterior probabilities. And those priors require either a derived intentionality in the form of programming (and the problems outlined here in Section I) or an S-O-R entity (and the problems outlined above).

Furthermore, while some cognitive processes may indeed rely on drawing a statistical inference, there are clear limitations if this was to be applied to all processes. A large missing component is that future actions will be limited by a lack of systematicity. As put by Fodor:

“What we mean when we say that linguistic capacities are systematic is that the ability to produce/understand some sentences is intrinsically connected to the ability to produce/understand certain others. You can see the force of this if you compare learning languages the way we really do learn them with learning a language by memorizing an enormous phrase book…You can learn any part of a phrase book without learning the rest…[However] a speaker’s knowledge of his native language is never like that. You don’t find native speakers who know how to say in English that JOHN LOVES THE GIRL but don’t know how to say in English that THE GIRL LOVES JOHN.”

This ability where one’s understanding of the syntax allows one sentence to be as equally comprehended as the reverse is not the same as a statistical recognition of lexical items. It’s a cognitive capacity working on mental representations that isn’t generated by gathering a mass of lists:

“This view of sentence structure… is inadequate. SHEEP PROVIDE WOOL has no (physical) frame at all, but no other arrangement of these words is an English sentence. The sequences FURIOUSLY SLEEP IDEAS GREEN COLORLESS and FRIENDLY YOUNG DOGS SEEM HARMLESS have the same frames, but only one is a sentence of English (similarly, only one of the sequences formed by reading these from back to front). STRUGGLING ARTISTS CAN BE A NUISANCE has the same frame as MARKING PAPERS CAN BE A NUISANCE, but is quite different in sentence structure, as can be seen by replacing CAN BE by IS or ARE in both cases. There are many other similar and equally simple examples. It is evident that more is involved in sentence structure than insertion of lexical items in grammatical frames; no approach to language that fails to take these deeper processes into account can possibly achieve much success in accounting for actual linguistic behavior.” – Noam Chomsky

And this processing by rules should be instantiated even without S-O-R and psycholingustic extrapolations. This is not limited to just linguistic structures, but generalizes to many logical processes:

“It is not plausible that only the minds of verbal organisms are systematic. Think what it would mean for this to be the case. It would have to be quite usual to find animals capable of representing the state of affairs aRb but incapable of representing the states of affairs bRa. Such animals would be aRb sighted but bRa blind since the representational capacities of its mind affect not just what an organism can think, but also what it can perceive. In consequence, such animals would be able to learn to respond selectively to aRb situations but quite unable to learn to respond selectively to bRa situations. So that, though you could teach the creature to choose the picture with the square larger than the triangle, you couldn’t for the life of you teach it to choose the picture with the triangle larger than the square.” – Jerry Fodor

Verbal or not, this cognitive systematicity is lacking without S-O-R. This is because behaviorism will not admit to mental events being sufficiently real, with the implication that:

“What does not really exist cannot cause anything, and the behaviorist… believes deep down that mental causes do not exist.” – Jerry Fodor

And being nonexistent, mental events cannot causally influence one another:

“In the realm of the mental many examples of event-event causation involve one mental state’s causing another, and for this kind of causation behaviorism provides no analysis. As a result the behaviorist is committed to the tacit and implausible assumption that psychology requires a less robust notion of causation than the physical sciences require.” – Jerry Fodor

Under an S-O-R theory, learning makes a range of responses possible to the organism but does not determine them. The organism determines the response (R) based on the goal (O) being sought after in the current circumstance (S). Thus, training and repetition when exposed to certain environmental stimuli may alter one’s ability to efficiently achieve a goal by changing expectancies, hopefully from incorrect to correct, but it need not change the goal. This means the acts being performed are not statistical or S-R emissions, but occur because a certain mental description of those actions occurs that influences one’s achieving a goal. In order words:

“In all teleological laws, the independent variable (the situation) is characterized in terms of its relation to the dependent variable (the action) as one which ‘calls for’ this action. And this, on [behaviorist] assumptions, is inadmissible.” – Charles Taylor

However, with goals postulated, many problems resolve. For one thing, it explains improvisation:

 “When an animal has been trained to emit a certain response, i.e., bring about a certain goal, the goal may not only determine which variation of a given motor pattern will be emitted in the situation, but also which motor pattern will be used. We have already seen this is in the case of ‘getting the lever down’, where the rat may use teeth, right paw, left paw, chin, etc. Thus if we train a rat to depress a lever, an action which he usually performs with his teeth, and if we then muzzle him, he will straightaway use his paws.” – Charles Taylor

It also explains novelty in terms of a goal or purpose:

“To take just one example, the response ‘I am looking for my glasses’ is certainly not equivalent to the proposed paraphrases: ‘When I have behaved in this way in the past, I have found my glasses and have then stopped behaving in this way,’ or ‘Circumstances have arisen in which I am inclined to emit any behavior which in the past has led to the discovery of my glasses; such behavior includes the behavior of looking in which I am now engaged.’ One may look for one’s glasses for the first time; or one may emit the same behavior in looking for one’s glasses as in looking for one’s watch, in which case ‘I am looking for my glasses’ and ‘I am looking for my watch’ are equivalent, under the Skinnerian paraphrase. The difficult questions of purposiveness cannot be handled in this superficial manner.” – Noam Chomsky

So it seems one cannot do without internal motivations. Can the nature of those internal motivations fit with behaviorism?

D) Shut Up, it’s about Drives

An animal may be trained to push a lever to get food, but it may not do so if not hungry. In order to avoid positing goals and other states requiring S-O-R, behaviorists call upon drives, viz. intervening variables standing in for states of activity called upon to fulfill a need. So a reinforcer can be redefined as that which reduces a drive state and reduces some need sought after to restore bodily homeostasis. This leads to conditions under which an organism learns without needing a schedule of reinforcement, known as latent learning. Things like hunger and thirst condition behavior as elements in a total situation and thus contribute to the aims of a behavior in place of having a mental goal.

But this needs further refinement. First, are drives directed or directionless? In hunger, am I directed towards any kind of, or a specific type of, food? If there is no direction, as thought by CL Hull, if by drive one means simply an arousal state, then this reaches contradiction in the simplest cases:

“The Hullian notion of drive as an absolutely general activator of behavior is close to the absurd. When an animal is satiated for food and sexually aroused, he does not gorge himself.” – Charles Taylor

So then maybe drive is meant to be something like an instinct towards a specific stimuli class. This, however, leads to a need to multiply new drives to account for behaviors that cannot otherwise be explained. So a rat drawn through a t-maze will learn the route due to a “curiosity drive.” Such speculative drives can be speculated upon and manipulated endlessly to match any data. As put by Chomsky:

“In probably every instance an ingenious drive-reduction theorist could find some fragment of fear, insecurity, frustration, or whatever, that he could insist was reduced and hence was reinforcing. The same sort of thing could be said for the ingenious phlogiston or ether theorist.”

The only explanation of positing such drives is that learning takes place in these situations, and learning under such conditions cannot be explained explained by behaviorism. When the number of drives multiplies to the level of possible number of desires or goals, why not simply surrender behaviorism and admit goals, desires, intentions, etc. into one’s theory?

“[Behaviorism was] bad poetry disguised as science.” – Julian Jaynes
And if the notion of drive doesn’t deflate to simply become the concept of goals, then at other times it is plainly just false. For example, desire would better explain cases where the intensity of motivation does not correspond to the duration of deprivation or a pent-up need. A classic example is the “salted nuts phenomena”. To wit:

“It’s a truism that the desire for salted peanuts is directed towards eating salted peanuts. Still, eating salted peanuts doesn’t stop you from wanting to eat more salted peanuts; a fortiori, it doesn’t stop you from scrounging for more. What Shakespeare said of Cleopatra is true of salted peanuts, too: they make hungry where they most satisfy.” – Jerry Fodor

A less trivial example than salted foods is motor learning. It’s hard to see how motor function learning could proceed without having a goal. For instance:

“When a child learns to walk for the first time, after many attempts, in the process, say, of getting a rattle, he does not have to start all over again from scratch in learning the movements needed to get to some other object.” – Charles Taylor

We also see this in social learning. Without any instruction, a child watching his parents using a comb will try to comb his own hair and we can say he performs the act because he finds it reinforcing. But the decision to imitate is made by the individual, as simply seeing the act doesn’t trigger like a reflex does. Rather, the individual, through an S-O-R faculty, decides to pick an identity as a goal and so repeats attendant behaviors.

All in all, this should paint a picture of the human mind as being more than an S-R response in a box…

“They saved appearances by forging expressly occult qualities or faculties which they imagined to be, like little demons or goblins capable of producing unceremoniously that which is demanded, just as if watches marked the hours by a certain horodeictic faculty without having need of wheels, or as if mills crushed grains by a fractive faculty without needing any thing resembling millstones.” – Gottfried Leibniz

But even if one acknowledges motivation that occurs without reinforcement and without external provocation, ie. intrinsic motives, a behaviorist need not allow for the motives to be mental. As argued by Dennett: “behavior is driven principally by a reward system that…is phenomenologically comprised simply of the passions.” Similar to drives, this allows for influencing behaviors by inner components other than thoughts. For instance, emotions can serve as a way to manipulate data and produce certain behavioral responses. Emotions have been excluded from machine intelligence discussion before since they are not considered algorithmic (see Section F here), but being non-algorithmic is not a problem for learning theorists. As such, emotional machine learning programs have been created such as SOARACT-R, and LIDA.

For emotions, it sill seems necessary to have an S-O-R. Intrinsic motivation involves an end of behavior that is not some separably identifiable reinforcer, viz. it is independent of a history of reinforcement. As put by Taylor:

“To take pleasure is to take pleasure in something, and to enjoy is to enjoy something, and the pleasure or enjoyment is not separably identifiable from what is being enjoyed; that is, we cannot know that we are experiencing pleasure, and be in ignorance of what we are taking pleasure in.”

This is seen in “gratuitous” activities, for example:

“Sometimes we say simply that a given type of activity is desirable in itself, e.g. all forms of play, exercise, cultural pursuits, and so on. These actions are those which we say are not done ‘for any other purpose’, that is, other than themselves.” – Charles Taylor

This would seem to align with common sensibilities:

“Phenomena of this general type are certainly familiar from everyday experience. We recognize people and places to which we have given no particular attention. We can look up something in a book and learn it perfectly well with no other motive than to confute reinforcement theory, or out of boredom, or idle curiosity. Everyone engaged in research must have had the experience of working with feverish and prolonged intensity to write a paper which no one else will read or to solve a problem which no one else thinks important and which will bring no conceivable reward, which may only confirm a general opinion that the researcher is wasting his time on irrelevancies.” – Noam Chomsky

Not all of one’s motive can be instrumental. There must be some things that one cares for just for their own sake. At one point it was even argued that happiness emerges not instrumentally but from the unimpeded flow of desired actions. Again, this becomes a reductio to a bad joke:

A lawyer is offered sex by a beautiful woman. He replies “well I guess so, but what’s in it for me?”

Common folk psychology relies on behaviors being a function of desires that select responses pursuant to a certain goal. But there are multiple ways to realize the mental states behind any passion or emotion. Consider anxiety:

“Anxiety is generally not ignorant of what it is anxious about. As a psychic state it has a direction, and an explanation in terms of anxiety is also one in terms of desire [of what to avoid], that is, an explanation of a teleological form.” – Charles Taylor

Or consider fear. Fear is a result of a felt danger. If no danger is felt then no fear is felt. But the need to feel danger in order to register fear is evidence against S-R theories. Put succinctly:

“When a routine is well established and we have confidence in it, we feel no fear. There is generally speaking no fear involved in plugging in a light, although one is taking care not to get a shock.” – Charles Taylor

A peculiar aspect of fear is that it is only loosely related to the physical level of danger. What seems to be more important than the objective level of danger is the feeling, however illusory, of control. This is seem empirically in combat casualties and their etiologies:

“Combat units typically suffer one psychiatric casualty for every physical one, and during Israel’s Yom Kippur War of 1973, frontline casualty rates were roughly consistent with that ratio. But Israeli logistics units, which were subject to far less danger, suffered three psychiatric cases for every physical one. And even frontline troops showed enormous variation in their rate of psychological breakdown. Because many Israeli officers literally led from the front, they were four times more likely to be killed or wounded than their men were and yet they suffered one-fifth the rate of psychological collapse

During World War II, British and American bomber crews experienced casualty rates as high as 70% over the course of their tour; they effectively flew missions until they were killed. On those planes, pilots reported experiencing less fear than their turret gunners, who were crucial to operations but had no direct control over the aircraft. Fighter pilots, who suffered casualty rates almost as high as bomber crews, nevertheless reported extremely low levels of fear.” – Sebastian Junger

And so feeling danger is a result of mental uncertainty in the goal of avoiding a stimulus, which means that fear is inseparable from a desire to avoid:

“[S-R theory] will account for the fact that the rat will not repeat the action, i.e. running on to the grid, which led to a shock. But this isn’t enough. For a shock will not lead simply to a tendency not to repeat the actions which led to it, but also to a tendency to avoid the painful spot. Thus, a rat will not only fail to repeat the original action, but may also refrain from other actions which will have the same result.” – Charles Taylor

Furthermore, this violates basic methodological principles of behaviorism. Avoidance behavior cannot be explained in terms of fear, for avoidance will lead to avoiding the element of danger that would produce fear and provide reinforcement. Stimuli being paired without subsequent pain should extinguish the fear response. The only way for fear to persist in S-R theories is to have the non-occurrence of pain itself be a reinforcing state of affairs, but this would reinforce every action not followed by pain, instead of reinforcing only avoidance responses.

If that sounds complex, let us simply turn to what happens empirically?:

“Avoidance responses extinguish slower than approach responses, because if avoidance if successful the animal stays out of the situation and never discovers that it has changed. But this is either an admission that the S-R theory is misguided, or no explanation at all.” – Charles Taylor

And thus the cycling back between falsity and tautology continues.

As they say: some people never learn.

E) Plastic People

So it would appear that strict behaviorism, without any dependency on the organism, appears to be ruled out. For instance, with infantile amnesia it would appear to be intractable to try to develop explicit responses dependent on memory before the age of 2. Likewise, if a person can register an image within 13 milliseconds, conscious perception doesn’t begin until around 270 milliseconds, then images presented faster than this rate should not be reinforcing in any way. Due to such species specific limitations, the idea of getting a behavior, however simple, to be performed across all types of organisms seems to have fallen out of favor:

“In the darkest days of conditioning theory, one psychologist claimed that, if we had a really adequate theory of learning, we could use it to teach English to worms. Happily, however, he later recovered.” – Jerry Fodor

If there are not blank slate organisms, then it appears that associative learning has to be parasitic off of something, even if only a finite sensory apparatus. So why not postulate initial conditions and later modify? For example, an activity such as imprinting appears to be an innate disposition to exhibit a restricted behavioral repertoire in response to a restricted set of stimuli. For such actions, a behaviorist can simply cordon off some computational activity to an innate mechanism and then allow the laws of conditioning to fill in the remaining experiences necessary for development.

“[For behaviorism,] green light of 505 millimicrons wavelength may be a stimulus but my grandmother is not a stimulus.” – E.G. Boring
This would allow the effect of experience itself to depend on the way in which the events are processed. Functions of maturation can be posited as internal immutable rules, but S-O-R can still be avoided if one focuses instead on development. A maturing bird has the innate capacity to fly sans training, but this is different than learning to fly to get food as a reward. However much the determinants must lie in the physiological make-up of the organism, learned behavior would still be guided by a history of learning. In the case of fixed action patterns, there may be instinctual acts but they still interact with the environment. For example, a mother bird hears an alarm call and instinctively finds shelter. Even still:

“The most accessible shelter is not a function of features of this part of the environment alone but of the whole.” – Charles Taylor

So it appears behaviorism can save itself, even if allowing for innate mechanisms to be given a portion of their theory, so long as the resulting behaviors are conditioned in response to stimuli without intervening mental states.

A good example area for the bone of contention is the acquisition of human language. It appears almost paradoxical, as put by Ray Jackendoff: “An entire community of highly trained professionals [viz. theoretical linguists], bringing to bear years of conscious attention and sharing of information, has been unable to duplicate the feat that every normal child accomplishes by the age of 10 or so, unconsciously and unaided.”

“We may now draw the conclusion that the causation of behavior is immensely more complex than was assumed in the generalizations of the past.” – Nikolaas Tinbergen

So how does it arise? By environment if you’re a behaviorist. There need not be any conscious effort to learn, merely unnoticed reinforcement schedules driving the child to learn words and their uses. So for behaviorism, now in relational frame form, a child learns language by the environment, including parents. Empirically, as per Dennett: “It takes on average of about six tokenings of a word in the presence of a child to generate her first clear efforts to say the word” after which “the infant just babbles away…and gradually bootstraps herself into comprehension by an unconscious process of massive trial and error”.

Yale_s child psychology lab-Dr. Arnold Gesell-1947
“The Anglophone tradition largely accepted the empiricist thesis that all concepts can be defined in a primitive basis of sensory concepts like RED or HOT or ROUND. This empiricist semantics was in turn supposed to be grounded in an epistemology according to which all knowledge is experiential in the long run. I think that even the friends of definitions now pretty much generally agree that the empiricist project failed. It was a cautionary example of what happens when you try to read your semantics off your epistemology.” – Jerry Fodor

The environment determining the behavior so is a fundamental flaw that was pointed out by Chomsky:

“Reasoning in the same way, we may conclude that the parent induces the child to walk so that he can make some money delivering newspapers…Perhaps this provides the explanation for the behavior of the parent in inducing the child to walk: the parent is reinforced by the improvement in his control of the child when the child’s mobility increases.”

Noam Chomsky also proceeded to thoroughly discredit the remaining components of a behaviorist theory of language acquisition. This has been dubbed the Chomskyan Revolution by some. As summarized by Fodor:

“Cognitive science started back in 1959 with Chomsky’s epoch-making review of B.F. Skinner’s ‘Verbal Behavior’. Chomsky’s deconstruction of the Skinnerian paradigm set the agenda for modern representational theory of mind that we’ve all been working on ever since.”

As such, let’s look now at some examples Chomsky uses to deflate the behaviorist account of language acquisition.

First, according to a behaviorist account, proper nouns should be under control of the stimulus that simply is the specific person or thing the noun refers to. Simple enough. However:

“Suppose that I use the name of a friend who is not present. Is this an instance of a proper noun under the control of the friend as stimulus? Elsewhere it is asserted that a stimulus controls a response in the sense that presence of the stimulus increases the probability of the response. But it is obviously untrue that the probability that a speaker will produce a full name is increased when its bearer faces the speaker. Furthermore, how can one’s own name be a proper noun in this sense?”

Mands (eg. commands, requests, questions, advice) are claimed to be responses that are reinforced by consequences following from the relevant conditions of deprivation. But this is unsatisfactory:

“In the case of the mand PASS THE SALT, the word deprivation is not out of place, though it appears to be of little use for functional analysis. Suppose however that the speaker says GIVE ME THE BOOK, TAKE ME FOR A RIDE, or LET ME FIX IT. What kinds of deprivation can be associated with these mands? How do we determine or measure the relevant deprivation? I think we must conclude in this case, as before, either that the notion deprivation is relevant at most to a minute fragment of verbal behavior, or else that the statement ‘X is under Y-deprivation’ is just an odd paraphrase for ‘X wants Y,’ bearing a misleading and unjustifiable connotation of objectivity.”

When separated into specific mands, they fare no better:

“PLEASE PASS THE SALT is a request (but not a question), whether or not the listener happens to be motivated to fulfill it; not everyone to whom a request is addressed is favorably disposed. A response does not cease to be a command if it is not followed; nor does a question become a command if the speaker answers it because of an implied or imagined threat. Not all advice is good advice, and a response does not cease to be advice if it is not followed. Similarly, a warning may be misguided; heeding it may cause aversive stimulation, and ignoring it might be positively reinforcing. In short, the entire classification is beside the point.”

There are also mands that could be under control by conditions of aversion such as threats, beatings, etc. with the conditioned aversive stimuli providing reinforcement. But examined more closely:

“It would appear to follow from this description that a speaker will not respond properly to the mand YOUR MONEY OR YOUR LIFE unless he has a past history of being killed… Furthermore, even if we extend the system so that mands can somehow be identified, we will have to face the obvious fact that most of us are not fortunate enough to have our requests, commands, advice, and so on characteristically reinforced (they may nevertheless exist in considerable strength).”

For Skinner, a tact is defined as “a verbal operant in which a response of given form is evoked (or at least strengthened) by a particular object or event or property of an object or event”. For example, seeing a fox couples the optical stimulus with the pronouncing of FOX as a response, perhaps coupled with a curiosity drive sensitive to words relating to fellow mammals. However, this explanation is unconvincing:

“Consider now the problem of explaining the response of the listener to a tact. Suppose, for example, that B hears A say FOX and reacts appropriately: looks around, runs away, aims his rifle, etc. How can we explain B’s behavior?… B may never have seen a fox and may have no current interest in seeing one, and yet may react appropriately to the stimulus fox. Since exactly the same behavior may take place when neither of the assumptions is fulfilled, some other mechanism must be operative here.”

Despite these failings, analogizer learning algorithms claim to have had some success on certain linguistic tasks. Analogizers work on local relations, eg. nearest neighbor algorithms. In the case of language, an example of a mechanism is computing statistical correlations between the phonological ends of verbs and its past tense inflection between pairings of data, and then chooses the past tense that is most correlated with the sequence in the training set when needing to choose inflection for a new verb stem ending. Statistically these appear to function adequately in programs such as Thomas Landauer’s Latent Semantic Analysis.

But working on nearby neighbors by statistical likelihood means a list is compiled wherein each symbol is given a causal connection to a related symbol. This is similar not only to S-R conditioning, but it also functions like traditional Associationism where the probability that one idea will elicit another is dependent on the strength of association (eg. frequency of reinforcement) between ideas. The problem with this, however, is:

“The strength of this association is in turn sensitive to the extent to which the ideas have previously been correlated. Associative strength was not, however, presumed to be sensitive to features of the content or the structure of representations per se.” – Jerry Fodor

In other words, a lack of semantic content and its representations means that all connections are determined by causal associations, and the clustering of those associations at certain thresholds will be considered patterns. Discovering patterns in a completely general ways like this, instead of a preprogrammed innate grammar, means that learning will alter the probabilities of transitions among different states but will do so without any ability to get a grasp of the content. The way this organizes content is problematic:

“There’s a way of reading associationism that makes association content-sensitive: take the content of a concept to be the set of associations it elicits. This makes sense of the idea that mental processes are causal, but the notion of content it provides is preposterous; DOG is not part of the content of CAT [even though DOG is a high associate of CAT]. The problem is to get mental processes to be causal without adopting a preposterous theory of mental content.” – Jerry Fodor

With only causal associations working, with no semantic content, how does thought ever focus on anything? If set of associations causes tokenings of all its associated bonds, what’s to stop infinite regresses?:

“Tokening pretty generally doesn’t cause one to think it’s associates; a good thing too, since, if it did, one’s thinking would be forever getting derailed by one’s associations. I’ve heard it suggested that this is what actually does go wrong with schizophrenics; thinking HOUSE causes them to think CHIMNEY which causes them to think SMOKE which causes them to think FIRE which causes them to think WATERand so on and on; with the consequence that they never manage to think I LIVE IN THE THIRD HOUSE ON THE LEFT. I have no idea whether that’s right about schizophrenics; but the problem it points to is perfectly real.” – Jerry Fodor

Thus, content is important because one can parse ideas if there is a representation of semantic content and not just causal associations. Put succinctly by Fodor:

“What else one thinks when one thinks HOUSE depends a lot on which of one’s projects the thinking is in aid of. The strong intuition is that one’s awareness of one’s project somehow guides the flow of one’s thought; but this intuition must be anathema to associationists, since they hold that the flow of thought is determined entirely by the strength of the associations among ideas.”

So basic association of stimuli or ideas is insufficient to account for learning, or even the constitution, of language. And whatever the outcome of empirical findings regarding other attributes, whether they are due to innate maturation or external development, language and thought are primarily not driven by external reinforcement.

F) Unnatural Selection

Even with innate grammatical parameters granted, behaviorism can continue. Maybe there is, according to Dennett, “an internal set of principles of rules or constraints that allow the child to narrow down her search through the vast space of possibilities, a directed, not random, process of trial and error.” Navigating this way needs to involve processes of selection, which are mimicked by the appropriately named Evolutionary or Genetic algorithms that generate large populations of codings, and those that make progress get to reproduce and continue again. What they do is narrow down the search space by “reasons”. I use scare quotes because for Evolutionary algorithms, a reason is instrumentalist which isn’t a problem for behaviorism since, as per Dennett: “one of our tricks is having reasons and not just acting for reasons” which allows for us to act “by citing reasons and assuming that they are guided by something like perceptual monitoring, the intake of information that triggers, modulates, and terminates the responses.” In other words, even though the theory has moved to reasons from reflexes, it is still S-R.

The first problem with this is that acting out of anticipations or expectations implies the functioning of an intentional state of a mind. Acting “as if” intentionality is there, but really isn’t, as per instrumentalism, seems bizarre. Put briefly:

“Fictions can’t select things, however hard they try. Nothing cramps one’s causal powers like not existing.” – Jerry Fodor

This also makes the “reasons” seem oddly conspiratorial. Not a conspiracy against people, just conspiracy seeking in mindset. A conspiracy theory works by explaining behavior by imputing an interest that the agent of the behavior doesn’t acknowledge. To say that an agent A did X in aid of Y, leaves open whether the motive is accessible or not to A. This could mean that A can be removed completely, with a conspiratorial motive attributed solely to Y. I say it seems conspiratorial because how can you argue that A did X in aid of Y when A doesn’t own up to Y? The only thing you can do is argue that X would be the rational thing for A to do if Y had been the motive; but where an interest in X would rationalize Y, so could an interest in P or Q or R. For example:

“It’s reasonable of Jones to carry an umbrella if it’s raining and he wants to keep dry. But, likewise, it’s reasonable for Jones to carry an umbrella if he has in mind to return it to its owner.” – Jerry Fodor

So to be decisive one needs a behavior that would be reasonable only given an interest in Y. This is rare because an interest in Y rationalizes doing X, so would an interest in X. And doing X solely for an interest in X is intrinsic motivation, which relies on S-O-R (as in Section D above).

Also, in terms of methodology, this leads to a cul-de-sac:

“Talk of schedules of reinforcement here is entirely pointless. How are we to decide, for example, according to what schedules covert reinforcement is arranged, as in thinking or verbal fantasy, or what the scheduling is of such factors as silence, speech, and appropriate future reactions to communicated information?” – Noam Chomsky

To get past this methodological pitfall, a behaviorist must cash out metaphors and proceed to hill climbing, where small steps blindly taken can go far. Here again, the terminology “genetic algorithm” is appropriate, as both behaviorism and genetic selection have theories that proceed by hill climbing.

However, there is a problem with this type of selection process:

“The explanation as to why a particular kind of creature evolves a particular trait in a particular ecology, is that for that kind of creature in that situation, having the trait is a cause of fitness. But then [the explanation] can’t also claim that in the sense that matters a trait was ‘selected for’ means that it is a cause of reproductive success. If it did mean that, the theory of natural selection would reduce to a trait’s being a cause of reproductive success explains its being a cause of reproductive success, which explains nothing (and isn’t true). This is all old news; because John’s being a bachelor is his being an unmarried man, John’s being a bachelor doesn’t explain his being an unmarried man.  Psychologists who hoped to defend the ‘law of effect’ by saying that it is true by definition that reinforcement alters response rate made the same mistake.” – Jerry Fodor

In other words, there is a circularity to hill climbing:

“You can’t replace a mechanistic explanation with a claim of analyticity…Consider the following analogy: There is a regularity that incoming calls on my ringers cause my ringer to ring. How does that work? No problem! If incoming calls didn’t cause the ringer to ring it wouldn’t be a ringer. Being ‘a ringer’ and being ‘caused to ring by incoming calls’ are inter-defined. That being so, there doesn’t need to be a mechanism that causes a ringer to ring in response to incoming calls. The job is done by causation itself, no other mechanism is needed.” – Jerry Fodor

But let’s ignore this for now. What could be on offer that makes this poor theorizing palatable? Behaviorists will mention microfeatures, or parts of a stimulus that can be isolated out and used for purposes of discrimination or generalization, the famous example of course being John Watson’s Little Albert.

“Behaviorists say that the question of what minds are for doesn’t arise, since there aren’t any… Associationists say that we don’t need a mind to think with (‘we don’t need an “executive”’ is how they put it) because ideas think themselves in virtue of the mechanical connections among them… Neuropsychologists say that since the mind is the brain, we don’t need the one because we have the other. That this bundle of muddle is recommended as the hard-headed, scientific way to do psychology is, I think, among the wonders of the age.” – Jerry Fodor

However, the term microfeature appears to be an empty notion:

“If we look at a red chair and say RED, the response is under the control of the stimulus redness; if we say CHAIR, it is under the control of the collection of properties (for Skinner, the object) chairness, and similarly for any other response. This device is as simple as it is empty.” – Noam Chomsky

Microfeatures also lack explanatory and predictive power:

“A typical example of stimulus control for Skinner would be the response to a piece of music with the utterance MOZART or to a painting with the response DUTCH. These responses are asserted to be “under the control of extremely subtle properties” of the physical object or event. Suppose instead of saying DUTCH we had said CLASHES WITH THE WALLPAPER, I THOUGHT YOU LIKE ABSTRACT WORK, NEVER SAW IT BEFORE, TILTED, HANGING TOO LOW, BEAUTIFUL, HIDEOUS, REMEMBER OUR CAMPING TRIP LAST SUMMER?, or whatever else might come into our minds when looking at a picture. Skinner could only say that each of these responses is under the control of some other stimulus property of the physical object.” – Noam Chomsky

Machine learning proponents would argue that overlapping information is sufficient to provide a way to process microfeatures. In the course of generating interaction information, independent events can be linked to a common effect. And so it is thought that generalization emerges from an overlap of microfeatures.

However, this talk of microfeatures employs the enthymeme that such aspects can be isolated out of a particular context. Whether an event or object is reinforced or unnoticed depends upon the other elements in the situation. If situations require this context, then details are isolatable due to interpretation of that context requiring the attention of intensionality. As is seen in gestalt works, the same stimulus elements can result in different structures being seen, so we cannot account for the structure simply In terms of the properties of the elements. It is not the properties present, but the selection of certain properties that is relevant. And it appears that making this distinction and selecting is not achievable by simple trial and error:

“A grasp of the essential structure of the problem, which [Max Wertheimer] calls ‘insight’…is introduced by the programmers before the actual programming begins…The ability to distinguish the essential from the inessential seems to be a uniquely human form of information processing not amenable to the mechanical search techniques…[because] all searching, unless directed by a preliminary structuring of the problem, is merely a blind muddling through” – Hubert Dreyfus

So without some sort of guidance, the problem of exponential growth (of search space) threatens since the number of properties to be examined would be indefinite. To get past this, there must be some ability to sort, not just identify microfeatures. If a response is held to be conditioned to all stimuli all the time, as would happen without S-O-R, its hard to see how anything can be selected. And a response might just as well generalize to all other situations as to none. Without intentionality, it is hard to see how anything could be singled out as a salient feature because:

“[There is an] indeterminacy that you always get when you try to interpret an intentional process in a domain that is specified extensionally. Say, if you like, that the machine sorts for size rather than color. But, if all and only red marbles stay on top, you might equally say that the machine is sorting for color rather than size…Sorting for size isn’t distinguishable from sorting for color.” – Jerry Fodor

Thus sorting must involve intentionality more so than simply responding to the stimuli that are microfeatures due to a failure of extensional, rather than intentional, filters:

“Selection-for problems need to appeal to counterfactuals if they are to distinguish between coextensive hypotheses… and only minds are sensitive to distinctions among counterfactuals.” – Jerry Fodor

Infinitely many properties can free-ride off of other coextensive properties, as in the example above, so intentionality is needed to make sort and distinguish. This ability to sort, with intentionality and counterfactuals, relies on constituents. Constituent structures are semantically evaluable relations, while microfeatures are derived automatically from statistical properties of samples of stimuli, they express. So, for example ‘has-a-handle’ is a microfeature, but not a constituent, of CUP. In other words:

“While the extensions of [microfeature] predicates are in a set/subset relation, the predicates themselves are not in any sort of part-to-whole relation. The expression ‘has-a-handle’ isn’t part of the expression ‘cup’ any more than ‘is an unmarried man’ is part of the phrase ‘is a bachelor’.” – Jerry Fodor

Even when considering relatively few examples, entertaining concepts in this way can be a predicament instead of a useful combination of thoughts:

“Imagine the following situation: at time t, a man is looking at the sky (so the nodes corresponding to SKY and BLUE are active) and thinking that John loves Fido (so the nodes corresponding to JOHN, LOVES, and FIDO are active), and the node FIDO is connected to the node DOG (which is in turn connected to the node ANIMAL) in such fashion that DOG and ANIMAL are active too. We can, if you like, throw it in that the man has got an itch, so ITCH is also on. According to the current theory of mental representation, this man’s mind at t is specified by the vector (+JOHN, +LOVES, +FIDO, +DOG, +SKY, +BLUE, +ITCH, +ANIMAL). And the question is: which subvectors of this vector correspond to thoughts that the man is thinking? Specifically, what is it about the man’s representational state that determines that the simultaneous activation of the nodes {JOHN, LOVES, FIDO} constitutes his thinking ‘John loves Fido’, but the simultaneous activation of {FIDO, ANIMAL, BLUE} does not constitute his thinking that ‘Fido is a blue animal’?” – Jerry Fodor

Better still, take a simpler example:

“If this is not immediately clear, consider the case where the simultaneously active nodes are {FIDO, SUBJECT-OF, BITES, JOHN}. Is the propositional content that Fido bites or that John does?” – Jerry Fodor

Compare this with how constituents tie predicates together and form propositions:

“[The copula ‘is’] is employed to distinguish the objective unity of given representations from the subjective…Only in this way does there arise from the relation a judgment, that is a relation which is objectively valid, and so can be adequately distinguished from a relation of the same representations that would have only subjective validity – as when they are connected according to laws of association. In the latter case, all that I could say would be ‘If I support a body, I feel an impression of weight’; I could not say, ‘It, the body, is heavy’. Thus to say ‘The body is heavy’ is not merely to state that the two representations have always been conjoined in my perception…what we are asserting is that they are combined in the subject.” – Immanuel Kant

In other words, mental representations can distinguish concepts that are simultaneously entertained (THIS, BODY, HEAVY) from those concepts that are predicated (THIS BODY IS HEAVY). How so?

BODY and HEAVY have referents, both as microfeatures and as constituents, so this is not the relevant aspect. What matters is that microfeatures process the copula (eg. IS) as just another part of the list of lexical items, which makes the copula a tertium quid. In other words, if IS is not a link but just another causal referent, then it also needs a link. And a regress, pointed out by Bradley, takes hold and becomes vicious.

On the other hand, if treated as constituents, predicated concepts involve a process of construction, viz. each symbol and even syncategorematic relations between them are given content. This construction occurs where the copula links the subject and predicate into the state of affairs of an intentional state (ie. as content of a judgment). This construction involves not just a mere summation, but a unity of the proposition since the intentional state forming the judgment has a unity of apperception.

And so, the proposition “This body is heavy” is not merely a summation of the list of items consisting of This+body+copula+heavy.

Also, it is from this uniform process of intentionality on operations of constituents that systematic generalization is possible:

“We constantly read and hear new sequences of words, recognize them as sentences, and understand them…[and] we recognize a new item as a sentence not because it matches some familiar item in any simple way, but because it is generated by the grammar that each individual has somehow and in some form internalized. And we understand a new sentence, in part, because we are somehow capable of determining the process by which this sentence is derived in this grammar.” – Noam Chomsky

The understanding granted by intentionality is partly because intentionality allows data to be seen under a certain description, ie. as part of working towards or away from a certain goal. And this seems to be the case for even the most basic mental aspects:

“We construct perception instead of revealing its own proper functioning and we once again miss the primordial operation that impregnates the sensible with a sense and that is presupposed by every logical mediation and every psychological causality.” – Maurice Merleau-Ponty

Some of these difficulties can be evaded if programmers do some selecting beforehand for the machine. This point is granted by Dennett even:

“Human imagination, the capacity we have to envision realities that are not accessible to us by simple hill climbing from where we currently are, does seem to be a major game changer, permitting us to create, by foresighted design, opportunities and enterprises and artifacts that could not otherwise arise.”

This brings us back to the problems of derived intentionality and non-algorithmic insight discussed previously (here in Sections H and F respectively).

“Behaviorism now survives as a horrible example of what can happen when psychologists believe what philosophers tell them about the scientific method.” – Jerry Fodor

In short, evolutionary algorithms of learning fail to properly select among coextensive traits, they fail to distinguish between different intentional content, and they do so without good “reasons.” With even Dennett adding in derived intentionality, who thought this was a good idea again?

G) Master of Puppets

The last remaining type of machine learning is connectionism. Connectionism posits posits that mental states arise from the way neurons are connected to each other and become active within the system through the likes of McCulloch-Pitts neurons and Hebbian synapses. These systems function by a series of interconnected units, each one of which receives activity that is graded as excitatory, inhibitory, or some combination of the two. This input activity is summed, perhaps past an threshold, and the resulting output is allowed to modulate activity of the same unit or others according to a ‘weight’. The modulation is considered “learning”, causing feedback and forming “memories” that can lead to more “learning”, and so on.

From these principles, some similarities to neural connectivity are realized. In both systems, neural connections influence brain activity, memory engrams are not spatially local, neurons activate after passing a threshold from the summation of dendritic inputs, etc. Such findings have led groups such as the Parallel Distributed Processing Research Group to have the explicit goal of “replacing the ‘computer metaphor’ as a model of the mind with the ‘brain metaphor’.”

“Suppose that one does demand ‘operational criteria’ for applications of the explanatory vocabulary of psychological theories. Since there is no reason to doubt that how an animal behaves depends on what it believes, wants, thinks, intends and remembers, we will need operational criteria for each of those…There simply aren’t such criteria, nor ought we to expect there to be. Scientific theories are about what there is in the world; they are not about how to tell what there is in the world.” – Jerry Fodor

This type of machine learning does have some advantages. It appears to be able to process some optical illusions that were previously problematic in the area of neural binding. For example, the Necker Cube used to present a problem for programs that lacked learning, but a connectionist network can overcome this. A connectionist model will process information of visual features of two arrangements that appear to human perception, with each arrangement acting as activation within a coalition and inhibition between coalitions. Thus, the two dominant stable perceptions can be processed, similar to human visual perception.

“Many AI programmers seem focused on finding a single powerful mechanism to induce everything from statistical data. This is much like what the psychologist B.F. Skinner imagined…when he concluded all human thought could be explained by mechanisms of association. The whole field of cognitive psychology grew out of the ashes of that oversimplified assumption.” – Gary Marcus

However, this type of arrangement is not without problems. Prominent among them is the Training Problem. A connectionist network’s units have thresholds that are adjustable contingent upon feedback trained upon exemplars and using feedback as error correction. In other words, following the network’s output, the thresholds between its units are changed depending upon whether the output was correct or not. Without this, no “learning” would be possible. But if a neural network needs information for error-correction, where does it get that from? Turns out it requires some sort of external assistance to tell it whether its output was the correct response to the input it was given, and without this knowledge it cannot tell whether it has made an error or not and so cannot be trained. This sort of help is the type of thing a human mind functions without:

“For many cognitive skills there is no easy way networks in the brain can obtain the error-correction feedback needed to train them. The problem is recursive: if there was a process in the brain which could provide this information then its own development would depend in turn upon some further process.” – John Skoyles

So the role of inputs and experience is questioned. Still others argue that, at best, machine learning provides a mere simulation of mental acts. And just as a simulation of a rain storm isn’t wet, the simulation of consciousness can approximate but not recreate the mental:

“At best, research in artificial intelligence can write programs which allow the digital machine to approximate, by means of discrete operations, the results which human beings achieve by avoiding, rather than resolving, the difficulties inherent in discrete techniques.” – Hubert Dreyfus

After all, it’s hard to see how gathering information will generate knowledge. As explained before (Section F here), information is not knowledge, and bridging the gap is a dubious prospect. To wit:

“If you get more and more data, and better and better statistics, you can get a better and better approximation to some immense corpus of text. But you learn nothing.” – Noam Chomsky

Some behaviorists may bite the bullet and admit that information is not knowledge, that competence is not comprehension. Put this way by Dennett, just as “an elevator’s design accounts for its competence in controlling its activities without the elevator having to comprehend what it is doing” it follows that “Deep Learning and Bayesian methods…will give us great answers to hard questions like never before, they won’t be able to tell us why.” But this does not deter the theorists. They will insist that this learning relies on evidential logic rather than traditional knowledge. This means finding patterns from information alone, knowledge be damned. For some, this level of complexity and recurrence also results in Super-Turing computations that allow them to escape the problems of Turing machines.

But leaving aside, rather than solving, one set of problems often means picking up another set. For evidential logic, conveying understanding to another becomes simply a matter of increasing the strength of the listener’s already available behavior. But this would make understanding dependent on a history of reinforcement:

“It is clear that understanding a statement cannot be equated to shouting it frequently in a high-pitched voice (high response strength), and a clever and convincing argument cannot be accounted for on the basis of a history of pairings of verbal responses.” – Noam Chomsky

It is obvious that knowledge does not work in this way, although weighted inputs would. There is also the problem of knowing counterfactuals as opposed to merely adding more bits:

“If A imparts to B the information (new to B) that the railroads face collapse, in what sense can the response THE RAILROADS FACE COLLAPSE be said to be now, but not previously, available to B? Surely B could have said it before (not knowing whether it was true), and known that it was a sentence (as opposed to COLLAPSE FACE RAILROADS THE). Nor is there any reason to assume that the response has increased in strength, whatever this means exactly (e.g., B may have no interest in the fact, or he may want it suppressed). It is not clear how we can characterize this notion… without reducing [a behaviorist] account of ‘imparting knowledge’ to a triviality.” – Noam Chomsky

The distinction is made even more obvious in trying to distinguish illusory and veridical perceptions. Google’s DeepDream was a trained neural network  that would focus on some stimulus like a dog by means of pattern matching. Enhancing the system, however, by more learning and more calibration meant that the pattern matching faculty increased and even the slightest amount of noise resembling the pattern information would be identified as a dog. This type of overactive pattern matching essentially results in hallucinations but without the ability to discern reality as there is only evidential logic, there is no “normal” to “know” for comparison.

“The trouble with behaviorism is that it did not allow that mental states were causes and effects.” – Ned Block

But for a behaviorist, there’s no reason to think there’s any validity in finding a true match, there is only “another” or a “more efficient” pattern for matching. This is embodied in machines such as the Augmented Transition Network, which functions on the basis of graph theory. Graph theory is a form of relationalism, which says that all there is to objects is the set of inter-relationships. And so, like a mathematical graph, the nature of any entities involved reduces, via Ramsey method, to their place within a structural description. Under this view, it is wrong to think there is anything more to entities than their relations within a system.

The problem, however, is that relations require relata:

“Besides external presence, ie. relational properties of the substance, there are other intrinsic properties, without which the relational properties of the substance would not exist, because there would be no subject in which they inhered.” – Immanuel Kant

Without an intrinsic property, the relationships in graph theory end up being empty. Having a system of relations is trivially true of a set of objects since given enough entities it follows that any system of relations over those entities is instantiated, viz:

“Any collection of things can be organized so as to have the structure W, provided there are the right number of them.” – Maxwell Newman

This means relations are simply sets of ordered sequences of entities (eg. a two place relation is a set of ordered pairs). Given the entities, those sets and sequences will automatically exist, and given Ramsey sentences one can always generate other equally satisfactory networks of relations, provided they respect the original set’s cardinality. So unless relata have an intrinsic character, then relationalism says nothing about the world beyond an assertion of cardinality.

Minds have intrinsic natures. These can be considered qualia since:

“There can no more be a subject without an experience than there can be a surface without extension” – Galen Strawson

But there is no intrinsic content to connectionist networks, as they specify causal relations the same way as Associationist connections. The nodes in the models are labeled with semantic content, but the weights acting on the nodes are statistical properties of the events in the environment, not the content of the node. Nodes are causally connected to other nodes, but there is no structural relation between them. This configuration has implications, however, when one needs to combine them, as is done in thinking:

“Take a sentence of a dozen words, and take twelve men and tell them each one word. Then stand the men in a row or jam them in a bunch, and let each think of his word as intently as he will; nowhere will there be a consciousness of the whole sentence.” – William James

Being configured in such a way, without consideration of intrinsic properties, is insufficient to yield thought. On the other hand, such a configuration of men would be able to form a bridge or other spatial objects. This is because such a configuration is linkage without any respect to intrinsic representations and content.  This leads to predictable problems:

“Connectionists agree… that concepts are mental particulars and that they have causal powers. But connectionist architectures provide no counterpart to the relation between a complex concept and its constituents. So they have, just as Hume would have predicted, hopeless problems about productivity and systematicity.” – Jerry Fodor

Systematicity, as discussed above (see Section C), means that a lexical item must make approximately the same semantic contribution to each expression in which it occurs. This systematicity allows for productivity, or the ability to create an unbounded number of propositions. This productivity in turn stems from compositionality, where the constituent structure of mental representations are combined sensitive to their syntax (see Section F). However, nodes are a gathering of lists and concepts, not constituents and propositions. In practical terms, this means that:

“In a Connectionist machine, adding to the memory (eg. by adding units to a network) alters the connectivity relations among nodes and thus does affect the machine’s computational structure. Connectionist cognitive architecture cannot, by their very nature, support an expandable memory, so they cannot support productive cognitive capacities. The long and short is that if productivity arguments are sound, then they show that the architecture of the mind can’t be Connectionist.” – Jerry Fodor

So it came to pass: a learning network modeled on the brain falters in the face of learning specific tasks (eg. memory) and other quite general brain capacities.

However neurally sound this approach is, psychologically it is not. Where does this leave the future of neural nets? As viewed by Fodor:

“An Associationist account of the nature of mental processes…is a retreat to Hume’s picture of the mind, and it has a problem that we don’t believe can be solved: although mental representations are structured objects, association is not a structure sensitive relation. The problem is thus how to reconstruct the semantical coherence of thought without postulating psychological processes that are sensitive to the structure of mental representations…It’s an instructive paradox that the current attempt to be thoroughly modern and ‘take the brain seriously’ should lead to a psychology not readily distinguishable from the worst of Hume and Berkeley.”

H) End

So there are many problems here. None of which bode well for a behaviorist.

“I took a course in psych [101]. All I learned from that is that a pigeon will smash its face against something in order to get a pellet of food.” – Lewis Black
While it works for reflexive responses, behaviorism is unable to accomodate intentionality and therefore goal directed behavior. The environment can be brought to account in altering a system’s initial configuration, but in ignoring mental representations that system lacks the powers necessary for thought or language.

The take away message here is so simple as to almost be a truism: in order to explain goals, intentionality, and thought then the ontology of your theory should have goals, intentionality, and thought:

“It’s been clear since Aristotle that explaining actions by attributing beliefs and desires to their agent is the very paradigm of how a mentalistic psychology does it thing. I did know, sort of vaguely, that some philosophers were prepared to live with beliefs and desires only as explanatory fictions: creatures like us believe ‘as if’ we had them, but that’s an illusion to which a due skepticism about theoretical entities was the recommended antidote…That struck me as just cheating, and it still does. For better or worse, the ontology of the theories one accepts is ipso facto the ontology to which one is committed.” – Jerry Fodor

That being so, what remains for S-R theory is risible, so let’s end comically:

“Psychologists just don’t find behaviorism very reinforcing these days. Skinner might think this was unfair, but if he demanded reasons, if he asked his critics to justify their refusal to follow his lead, he would have to violate his own doctrines and methods.” – Daniel Dennett

So how does one incorporate plasticity into a theory of mind without lapsing into a S-R theory? Well, from behaviorism, we can see that learning does happen, but we also know, from where it fails, that goals occur. So what happens in between? What happens for events such as building a habit that works between environmental and intrinsic influence?

That may be a useful paradigm case. We shall look at it next time.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s