Since the question has been asked, it’s worth reading the detailed history I was asked to write by the ACM in 1992, that became one of the sections of the 2nd History Of Programming Languages conference. The Early History Of Smalltalk
In brief for here, I saw parts of the idea in various forms starting in the early 60s, and thought it useful, but stayed asleep until in 1966 I saw Ivan Sutherland’s Sketchpad system (which completely changed the ways I looked at computing), and within a week saw and learned the first Simula, which was less grand than Sketchpad, but showed how ordinary programming could be changed to take advantage of instantiations of processes.
This double whammy combination “rotated” me to see things from very different perspectives.
A key part of the “rotation” was that
(1) at that time multi-processing and time-sharing systems were using hardware modified to isolate separate processes in the form of “virtual versions of the hardware”
(2) ARPA was in the process of talking about doing the ARPAnet, that would allow many computers to intercommunicate
(3) my two main concentrations in college had been pure math and molecular biology
The form of the “rotation” was ridiculously simple. It was the simple realization that a computer could compute what any computer could compute, and thus you could represent anything computable at any scale using only intercommunicating computers (most would be virtual) as building blocks.
This was completely impractical (which I think was one of the reasons I didn’t think of it earlier). The molecular biology and the ARPAnet really helped, because it was known in the mid-60s roughly that each cell in our body contained billions of informationally interacting components, and we had 10 to 100 trillion cells in each of us. That kind of scaling actually worked, and was far beyond what computing could do.
I think that seeing Sketchpad shocked me into being able to use “pure math mode” as part of the thinking rather than just the “worry about efficiency” thinking I was used to doing when computing. If you allowed “infinitely fast and large” computing, then the idea made excellent sense: it was a universal building block for all scales, and what remained were the central problems of designing complex systems.
The nature of the intercommunications would allow schemes that were like algebras in pure math to be devised so that terms — like “+” or “sort” or “display” could have both general and specific meanings.
The huge potential got me to look at the “impractical” part, which looked much more doable than I’d thought (it still took about 5+ years and a great research group to do). LISP had already solved a number of the problems, and this proved to be a great set of ideas for context.
In the 1960s, software composites that were more complex than arrays, were often called “objects”, and all the schemes I had seen involved structures that included attached procedures. A month or so after the “rotation” someone asked me what I was doing, and I foolishly said “object-oriented programming”.
The foolish part is that “object” is a very bad word for what I had in mind — it is too inert and feels too much like “data”. Simula called its instances “processes” and that is better.
“Process-oriented programming” would have been much better, don’t you think?
In any case, I did not at all have “Abstract Data Types” in mind as a worthwhile goal, even though they were obvious — and this is because “Data” as an idea does not scale at all well.
You are much better off hiding how state is handled inside a “process”, only having processes, and treating processes as “servers” for each other.
That is what I had in mind back then.