" There is a very strong selection bias in the kinds of people who will thrive in any regimented system. People who don’t conform to those ideals are filtered out- either in the hiring stages, through layoffs, or through employee churn"... bravo!
The level of bias and conformity in science never ceases to amaze me, even from people who claim to fight against it.
"Retraining models from scratch is the hard path"... I'd say it's insanity. How could they build such models without a memory system for instance, when it's more than obvious that memory is a mechanism that allows to trade time against efficiency, and avoid relearning all the data on Earth with every new model? And still in a non efficient way...
I'd go even further: how can anyone believe it's possible to achieve human intelligence with statistics? Evaluating the probability of a word near other words, seriously? The word combinations complexity is infinite, so it will work only for the easiest things, or most obvious ones, and a ton of data (and they've exhausted all the data available already).
And that's exactly what we're seeing: chatGPT is very good at the most basic thing, ie chatting, conversing. Because the size of the training data is so huge, and language patterns are present everywhere, and above all, limited, so it converses well. It seems to perfectly 'understand' what we ask and formulate a perfectly worded answer.
I'm talking about the format, not the content. As soon as it becomes a bit complex, the accuracy drops, and sometimes becomes negative (hallucinations).
And, as you say, those teams of PhD's all suffer from the same issues, coming from academia. No one can see past his own beliefs, but in this case, it's a tragedy. They're out of touch with reality as you say. Inebriated by all that money and hype floating around.
As I often say, "it will never work because it cannot work".
Developing models that are basically mapping (ie limited right from the start) to learn reality, which has infinite complexity, should be cause for termination. Maybe it's better I'm not in charge :-)
I'd go even further: how can anyone believe it's possible to achieve human intelligence with statistics? Evaluating the probability of a word near other words, seriously? The word combinations complexity is infinite, so it will work only for the easiest things, or most obvious ones, and a ton of data (and they've exhausted all the data available already). - You hit the nail right on the head. It is foolish to think that we can model the world even reasonably well relying on data, and training on very simplistic models (these models are giant, but their operations are relatively simple).
Totally agree. Words are intrinsically linked to the real world, you can't just look at them and ignore how the objects they describe work with each other in the physical realm, but it's so 'easy' to use maths and matrices with thousands of dimensions, and humans are so stubborn... The curse of dimensionality won't disappear.
They keep building more and more powerful computation tools instead of, as you say, making models really intelligent. But that would mean knowing what intelligence is (I created a model for that btw).
In the meantime, what they are doing is looking at reality through a peephole that bends and blurs everything, from the wrong side, and then they try somehow to recreate what's on the other side, after losing so much information. Because of that, it will never work. Although, I must say, the current results are very impressive. But that's relative, one could also say they are very disappointing.
The current performance though means to me that real intelligence will be possible with the right models, but those have to include, to the very least, memory (to avoid relearning the data in the whole universe every time, and to create a hierarchical data structure).
They also make some wrongful assumptions, such as with one-shot learning. It is true that humans can learn from even 1 example, but not when they are 3 weeks or even 3 years old. What they conveniently overlooked, is that humans can learn in one shot, yes, but after creating a complex data structure over years and billions of little, repetitive, experiments (which have very little resemblance with the repetitive steps of gradient descent) where other humans give the proper feedback for a child to really learn. Basically, those models are a top-down approach, when humans learn in a bottom-up way, at least for the first 10-15 years.
Thanks for your article btw, it was very interesting.
I still don't understand why AI model itself is not a moat, because it is still too costly to unsupervised train a LLM base model. E.g. For now, Meta's LLaMA, it requires A100 (80GB) x 2048 to train a series of models (7b, 13b, 30b and 65b).
Most existing open source LLM models still relies on leaked LLaMA to supervised fine tune as new model. I.e. if LLaMA has no leak and the moat has not been removed, open source communities can only stick on their own self-trained low-performance 3b/7b, or even worst, achieve nothing.
My impression is: For now and near future, AI model itself is still a moat. For far-away future (>5 years), to be honest, I don't know. Or something I missed?
The reason AI is not a moat is because ROI-wise, there are almost always better solutions. The discussion around SOTA often misses out that performance increases are relative to each other. IRL, these performance increases are often meaningless when the chaos of real data pipelines is considered. Esp considering the cost.
Also, aside from LLaMa, we have seen models like Open Assistant, Bard, OPT, and Bloom which are available to play around with to various degrees. Sure you could argue that without any of those LLMs, the moat around AI Models would be stronger. But then again, that's a very different story.
Glad you liked it. Ulimately, I don't think that any of what I had predicted then or wrote about now should have been surprising. The fact that noone saw this coming- including the so called brilliant staff software engineers being paid close to million dollars/year shows how out of touch the field is becoming with reality
" There is a very strong selection bias in the kinds of people who will thrive in any regimented system. People who don’t conform to those ideals are filtered out- either in the hiring stages, through layoffs, or through employee churn"... bravo!
The level of bias and conformity in science never ceases to amaze me, even from people who claim to fight against it.
"Retraining models from scratch is the hard path"... I'd say it's insanity. How could they build such models without a memory system for instance, when it's more than obvious that memory is a mechanism that allows to trade time against efficiency, and avoid relearning all the data on Earth with every new model? And still in a non efficient way...
I'd go even further: how can anyone believe it's possible to achieve human intelligence with statistics? Evaluating the probability of a word near other words, seriously? The word combinations complexity is infinite, so it will work only for the easiest things, or most obvious ones, and a ton of data (and they've exhausted all the data available already).
And that's exactly what we're seeing: chatGPT is very good at the most basic thing, ie chatting, conversing. Because the size of the training data is so huge, and language patterns are present everywhere, and above all, limited, so it converses well. It seems to perfectly 'understand' what we ask and formulate a perfectly worded answer.
I'm talking about the format, not the content. As soon as it becomes a bit complex, the accuracy drops, and sometimes becomes negative (hallucinations).
And, as you say, those teams of PhD's all suffer from the same issues, coming from academia. No one can see past his own beliefs, but in this case, it's a tragedy. They're out of touch with reality as you say. Inebriated by all that money and hype floating around.
As I often say, "it will never work because it cannot work".
Developing models that are basically mapping (ie limited right from the start) to learn reality, which has infinite complexity, should be cause for termination. Maybe it's better I'm not in charge :-)
I'd go even further: how can anyone believe it's possible to achieve human intelligence with statistics? Evaluating the probability of a word near other words, seriously? The word combinations complexity is infinite, so it will work only for the easiest things, or most obvious ones, and a ton of data (and they've exhausted all the data available already). - You hit the nail right on the head. It is foolish to think that we can model the world even reasonably well relying on data, and training on very simplistic models (these models are giant, but their operations are relatively simple).
Totally agree. Words are intrinsically linked to the real world, you can't just look at them and ignore how the objects they describe work with each other in the physical realm, but it's so 'easy' to use maths and matrices with thousands of dimensions, and humans are so stubborn... The curse of dimensionality won't disappear.
They keep building more and more powerful computation tools instead of, as you say, making models really intelligent. But that would mean knowing what intelligence is (I created a model for that btw).
In the meantime, what they are doing is looking at reality through a peephole that bends and blurs everything, from the wrong side, and then they try somehow to recreate what's on the other side, after losing so much information. Because of that, it will never work. Although, I must say, the current results are very impressive. But that's relative, one could also say they are very disappointing.
The current performance though means to me that real intelligence will be possible with the right models, but those have to include, to the very least, memory (to avoid relearning the data in the whole universe every time, and to create a hierarchical data structure).
They also make some wrongful assumptions, such as with one-shot learning. It is true that humans can learn from even 1 example, but not when they are 3 weeks or even 3 years old. What they conveniently overlooked, is that humans can learn in one shot, yes, but after creating a complex data structure over years and billions of little, repetitive, experiments (which have very little resemblance with the repetitive steps of gradient descent) where other humans give the proper feedback for a child to really learn. Basically, those models are a top-down approach, when humans learn in a bottom-up way, at least for the first 10-15 years.
Thanks for your article btw, it was very interesting.
I still don't understand why AI model itself is not a moat, because it is still too costly to unsupervised train a LLM base model. E.g. For now, Meta's LLaMA, it requires A100 (80GB) x 2048 to train a series of models (7b, 13b, 30b and 65b).
Most existing open source LLM models still relies on leaked LLaMA to supervised fine tune as new model. I.e. if LLaMA has no leak and the moat has not been removed, open source communities can only stick on their own self-trained low-performance 3b/7b, or even worst, achieve nothing.
My impression is: For now and near future, AI model itself is still a moat. For far-away future (>5 years), to be honest, I don't know. Or something I missed?
The reason AI is not a moat is because ROI-wise, there are almost always better solutions. The discussion around SOTA often misses out that performance increases are relative to each other. IRL, these performance increases are often meaningless when the chaos of real data pipelines is considered. Esp considering the cost.
Also, aside from LLaMa, we have seen models like Open Assistant, Bard, OPT, and Bloom which are available to play around with to various degrees. Sure you could argue that without any of those LLMs, the moat around AI Models would be stronger. But then again, that's a very different story.
Glad you liked it. Ulimately, I don't think that any of what I had predicted then or wrote about now should have been surprising. The fact that noone saw this coming- including the so called brilliant staff software engineers being paid close to million dollars/year shows how out of touch the field is becoming with reality