China's AI progress is so rapid that Americans are beginning to doubt reality

Text｜Lanxi

In the past few days, I have clearly felt that the English technical community is in a state of half shock and half confusion about the progress of China’s AI industry. The main sources of stress are two.One is Unitree’s wheeled robot dog B2-W, and the other is the open source MoE model DeepSeek-V3.

In the early years, Yushu was basically a follower of Boston Dynamics. The product form was completely copied from a cat to a tiger. Commercially, it was also aiming at a low-profile niche, which did not have much appeal. However, starting from the B series model, Yushu’s robot dog has been in the market In terms of flexibility, it is on par with Boston Dynamics.

The surprise of B2-W is that it switched the technical line and replaced the four-legged solution that B2 still used with a wheel solution that is more sporty but also more difficult to balance, and then within a year it completed the task of traveling through mountains and rivers in an outdoor environment. training, many Americans said under the video that this must be a CGI picture, and they didn’t know if it was real or just because they were crazy.

Boston has also briefly used the wheel solution on robot dogs, or it has tested far more solutions than Yushu – the company has been established for a long time – but as an industry pioneer, it cannot even maintain the entity of an American company. It can’t be done.

Hyundai Motor bought Boston Dynamics from SoftBank at a discount in 2020, just as SoftBank was suffering from huge book losses and needed to recover. SoftBank originally bought it from Google in 2017. Why did Google sell it because it felt it was too expensive? No, I can’t afford the loss.

This reason is outrageous. The U.S. venture capital system has the highest tolerance for losses in the world. For cutting-edge research, it is extremely common to spend money to criticize it – look at the progress of Silicon Valley in the past two years. We know the input-output ratio in AI – but why is Boston Dynamics being sold as a non-performing asset in its unique position?

There is an elephant in the room that the U.S. technology industry generally pretends not to see: Americans today, from investment banks to companies, from CEOs to programmers, from New York to the Bay Area, are disgusted with manufacturing. It has become instinctive.

A16Z partner Marc Andreessen wrote the widely circulated masterpiece “Software Eats the World” in the “Wall Street Journal” in 2011. It roughly means that software companies with extremely low marginal costs are destined to take over all places where water and grass thrive, and this Compared to other industries, this kind of business can provide exponential growth.

It’s not that there is anything wrong with Marc Andreessen’s expression. The reality of the past decade or so has indeed proven that this path of grabbing large-scale profits has the highest return. However, Americans’ path of dependence will inevitably bring about the consequences of a whole generation in the end. The result of people losing their ability to produce.

The loss of manufacturing capabilities mentioned here does not mean the loss of interest or enthusiasm for manufacturing. Some time ago, I visited a reverse overseas shopping company in Shenzhen. Their business is to make Huaqiangbei’s electronic accessories into an indexable structured catalog, and then provide purchasing services from From inspection to delivery of full-process services, the biggest buyers are the American DIY market and college students. The reason why they have to wait thousands of miles to entrust Chinese people to buy things is because in the United States, the University of Nottingham Locally, there is no supply chain at all.

Then those students only have the opportunity to really try to make something when they are studying. After they go to work in a big company and get paid, no one wants to get their hands dirty anymore.

But after all, software cannot run without hardware, even if the added value of hardware production is not enough. Based on the entrance of collecting first-hand physical data, manufacturers can harden their backs and develop a full set of solutions. It only depends on whether they can form a good team of engineers. On the contrary, But it was different in the past. When manufacturing orders were outsourced for a long time, it became a supporting industry chain and could not be returned.

Therefore, prototypes of emerging technologies such as multi-rotor drones and quadruped robot dogs are generally produced in Europe and the United States, which have capital for trial and error, which is the so-called “from zero to one” process.In the “from one to ten” implementation stage, China’s catch-up results will begin to appear intensively. After entering the “from ten to one hundred” mass production, China’s supply chain costs will directly kill the competition.

When Boston Dynamics’ robots first became popular on the Internet, the person in charge of Google Things, as the parent company, you are not only unhappy, but also want to hide. Now you understand where this concern comes from. You just feel that it is too despicable for Google, a software giant, to roll up its sleeves and do manufacturing work.

Of course, there are still builders like Musk in the United States, but you have to know that the reason why Musk’s story is moving is because people like him are extremely rare now, and they have not been welcomed by the mainstream technology industry for a long time. It relies entirely on achievements that go against common sense – building cars, rockets, and tunnels, which are all things that Silicon Valley is afraid of avoiding – to build its reputation step by step.

If Yushu has caused a wave of skepticism about reality in hardware, then DeepSeek has firmly controlled large model manufacturers in the native territory of software.

While Microsoft, Meta, and Google were all rushing to do large model training on 100,000-card clusters, DeepSeek spent less than $6 million and 2 months on 2,000 GPUs to achieve alignment between GPT-4o and Claude 3.5 Sonnet test results.

DeepSeek-V2 became popular half a year ago, but the narrative at that time was still relatively in line with the expectations of the old version:Chinese AI companies have launched low-cost open source models. If they want to become the price butcher in the industry, the Chinese are good at making such cheap and durable things. As long as they do not compare with top products, it is certain that they will work.

But V3 is completely different. It has reduced the cost by more than 10 times, and at the same time the quality is comparable to the t1 camp. The key is that it is open source. The comment area of related tweets is all “How did the Chinese do it?”

Although, late-developing large models can achieve more cost-effective training through knowledge distillation and other means – similar to the slope of the decrease in the speed at which you learn Newton’s three laws, it is also beneficial to the pursuers, and it is definitely faster than Newton himself figuring out the laws. Fast and low cost, but the incredible efficiency improvement is difficult to summarize using known training methods. It must have made innovations in the underlying architecture that are different from other giants.

Another angle is more interesting,If the final consequence of China’s AI chip ban policy is that China’s large model companies have to implement more efficient solutions under the constraints of limited computing power, this counterproductive plot would be too ironic.

Liang Wenfeng, the founder of DeepSeek, has also said before that the company’s problem has never been money, but that high-end chips are embargoed.

Therefore, China’s big model companies, such as Byte and Alibaba, can manage enough. If they use 1/10 of their annual income to invest in AI, it will not be a big problem. However, start-up companies do not have so much ammunition and cannot stay on the poker table. The only way is to innovate desperately.

Kai-Fu Lee has also been expressing a point of view this year,China’s advantage in doing AI has never been to conduct breakthrough research without setting a budget cap, but to find the optimal solution between good, fast, cheap and reliable.

Both Zero-One and DeepSeek use the MoE (Mixed Expert) model, which is equivalent to doing specific training on high-quality data sets prepared in advance. It cannot be said that there is no moisture in the running scores, but the market does not care about the principle, as long as the quality and price If you compare enough, you will definitely be competitive.

Of course, what is different about DeepSeek is that it is not short of cards. It has stocked up on 10,000 NVIDIA A100 cards in 2021. ChatGPT had not yet appeared at that time, and Meta stockpiled cards for the metaverse but accidentally caught up with the AI wave. It’s very similar to the fact that DeepSeek bought so many cards in order to do quantitative trading…

My earliest impression of Liang Wenfeng was from the preface written by him in “The Biography of Simons”. Simons was the founder of Renaissance Technology Company and a pioneer in using algorithmic models to make automated investments. Liang Wenfeng was in charge of 60 billion yuan at the time. For quantitative private equity, it is natural to write a preface to pay tribute to the founder of the industry.

To explain this background, I want to say that Liang Wenfeng’s companies, from quantitative trading to large model development, are not a process of turning finance into technology, but a switching of mathematical skills between two application scenarios, and the purpose of investment. It is a prediction market, and the principle of the large model is also to predict Token.

Later, I watched several interviews with Liang Wenfeng, and I have a very good impression of him. He is a very sober and intelligent person. I will post a few paragraphs for your experience:

“Undercurrent”: Most Chinese companies choose to have both models and applications. Why does DeepSeek currently choose to only do research and exploration?

Liang Wenfeng: Because we feel that the most important thing now is to participate in the wave of global innovation. In the past many years, Chinese companies have been accustomed to others making technological innovations and us using them to monetize applications, but this is not a matter of course. In this wave, our starting point is not to take advantage of the opportunity to make a fortune, but to go to the forefront of technology to promote the development of the entire ecosystem.

“Undercurrent”: The inertial perception left to most people in the Internet and mobile Internet era is that the United States is good at technological innovation, while China is better at applications.

Liang Wenfeng: We believe that with economic development, China must gradually become a contributor instead of always being a free rider. During the IT wave of the past thirty years or so, we have basically not participated in real technological innovation. We have become accustomed to Moore’s Law falling from the sky, and better hardware and software will come out after just 18 months at home. Scaling Law is also treated in this way. But in fact, this is something that the Western-dominated technology community has worked tirelessly to create for generations, just because we did not participate in this process before, so we ignored its existence.

“Undercurrent”: But in the Chinese context, this choice is too extravagant. The big model is a heavy-investment game, and not all companies have the capital to only research innovation without first considering commercialization.

Liang Wenfeng: The cost of innovation is definitely not low, and the past inertia of appropriationism is also related to the past national conditions. But now, whether you look at China’s economic size or the profits of major companies such as Byte and Tencent, they are not low in the world. What we lack in innovation is definitely not capital, but lack of confidence and not knowing how to organize high-density talents to achieve effective innovation.

“Undercurrent”: But when it comes to large-scale models, it is difficult to form an absolute advantage simply by leading in technology. What is the bigger thing that you are betting on?

Liang Wenfeng: What we see is that Chinese AI cannot always be in the position of following. We often say that there is a one or two year gap between China’s AI and the United States, but the real gap is the difference between originality and imitation. If this does not change, China will always be a follower, so some explorations are inevitable. NVIDIA’s leadership is not just the efforts of one company, but the result of the joint efforts of the entire Western technology community and industry. They can see the next generation of technology trends and have a roadmap in hand. The development of AI in China also requires such an ecosystem. Many domestic chips cannot develop because of the lack of supporting technical communities and only second-hand information. Therefore, China must have someone at the forefront of technology.

“Undercurrent”: Many large model companies are persistent in poaching people overseas. Many people think that the top 50 talents in this field may not be in Chinese companies. Where do your people come from?

Liang Wenfeng: There are no people who came back from overseas in the V2 model, they are all local. The top 50 talents may not be in China, but maybe we can build such people ourselves.

“Undercurrent”: So you are also optimistic about this matter?

Liang Wenfeng: I grew up in a fifth-tier city in Guangdong in the 1980s. My father is a primary school teacher. In the 1990s, there were many opportunities to make money in Guangdong. At that time, many parents came to my home. Most of them thought that studying was useless. But looking back now, my ideas have changed. Because it’s hard to make money, I may even have no chance to drive a taxi. That changes in one generation. There will be more and more hard-core innovations in the future. It may not be easy to understand now because the entire social group needs to be educated on the facts. When this society allows hard-core innovative people to become successful, group thinking will change. We just need a bunch of facts and a process.

⋯⋯

Isn’t it awesome? Anyway, I am a fan. I have to do the most difficult thing and make money standing up. All my beliefs are based on respect and judgment of true value. More and more people born in the 80s and 90s are standing up. The mainstream stage is very reassuring. You can say that they were the so-called “small town problem solvers” in the past, but what happened to problem solving? Participating in shaping the future of the world is the most challenging problem. I like to solve such problems. , that’s the fun.

For more exciting content, follow Titanium Media’s WeChat ID (ID: taimeiti), or download Titanium Media App

Follow Us

China’s AI progress is so rapid that Americans are beginning to doubt reality

S. Korean prosecutors say Yoon authorised ‘shooting’ during martial law bid

Doctor Who’s Latest Classic Colorization Made Some Wild Additions

Vasundhara Mali

About Author

Leave a comment Cancel reply

You may also like

After losing 25.1 billion in four and a half years, BAIC Blue Valley officially announced: capital increase of 12 billion

Semi-annual reports of 21 snack food companies: Laiyifen “falls”, while Three Squirrels “recovers”

Secret sixth-generation stealth fighters spotted in the skies over China

Can I have my own Amazon locker at home?

NYT Strands today — my hints, answers and spangram for Sunday, December 29 (game #301)

Secret sixth-generation stealth fighters spotted in the skies over China

Can I have my own Amazon locker at home?

NYT Strands today — my hints, answers and spangram for

Streaming in Canada on Crave, Disney+ and Netflix [Dec. 23-29]

Populer Posts

Streaming in Canada on Crave, Disney+ and

Four companies have taken up the challenge

A turning point for 8-bit home computing

Category