Tesla CEO Elon Musk has been testing a neural network training computer called the ‘Dojo’ since at least 2019. Musk says the Dojo will be able to process large amounts of video data to achieve vision-only autonomous driving. While Dojo is still in development, Tesla today revealed a new supercomputer that will serve as a prototype development version of what Dojo will eventually offer.
At the 2021 Conference on Computer Vision and Pattern Recognition on Monday, Tesla’s head of artificial intelligence Andrej Karpathy revealed the company’s new supercomputer that allows the automaker to ditch radar sensors and lidar in cars. autonomous in favor of high-quality optical cameras. During his workshop on autonomous driving, Karpathy explained that for a computer to respond to a new environment in a way that a human can, it requires an immense data set and an enormously powerful supercomputer to train the neural network-based autonomous driving technology. of the company using that data. place. Hence the development of these Dojo predecessors.
Tesla’s next-generation supercomputer has 10 petabytes of “hot tier” NVME storage and runs at 1.6 terbytes per second, according to Karpathy. At 1.8 EFLOPS, he said it could be the world’s fifth most powerful supercomputer, but later admitted that his team has yet to run the specific benchmark needed to enter the TOP500 supercomputer ranking.
“With that said, if you take the total number of FLOPS, it would actually rank somewhere around fifth place,” Karpathy told TechCrunch. “Fifth is currently held by NVIDIA with its Selene cluster, which has a very comparable architecture and a similar number of GPUs (4,480 vs. our 5,760, slightly less).”
Musk has been advocating a vision-only approach to autonomy for some time, largely because cameras are faster than radar or lidar. As of May, Tesla Model Y and Model 3 vehicles in North America are being built without radar, relying on cameras and machine learning to support their advanced driver assistance and autopilot system.
Many autonomous driving companies use LIDAR and high-definition maps, which means they require incredibly detailed maps of where they operate, including all lanes of the road and how they connect, traffic lights, and more.
“The approach we take is based on vision, using mainly neural networks that, in principle, can work anywhere on earth,” Karpathy said in his workshop.
Replacing a “meat computer”, or rather a human, with a silicon computer results in lower latencies (better reaction time), 360-degree situational awareness, and a fully attentive driver who never checks. her Instagram, Karpathy said.
Karpathy shared some scenarios of how Tesla’s supercomputer uses computer vision to correct driver misbehavior, including an emergency braking scenario in which the computer’s object detection kicks in to prevent a pedestrian from being struck, and a traffic control warning that can identify a yellow light in the distance and send an alert to a driver who has not yet begun to slow down.
Tesla vehicles have also tested a feature called pedal misapplication mitigation, in which the car identifies pedestrians in its path, or even the lack of a driving path, and responds when the driver accidentally steps on the accelerator in instead of braking, potentially saving pedestrians in your path. in front of the vehicle or prevent the driver from accelerating into a river.
Tesla’s supercomputer collects video from eight cameras surrounding the vehicle at 36 frames per second, providing incredible amounts of information about the environment around the car, Karpathy explained.
While the vision-only approach is more scalable than collecting, building, and maintaining high-definition maps around the world, it is also a much greater challenge, because neural networks that detect objects and drive driving have to be able to collect and process large amounts of data at speeds that match a human’s speed and depth recognition capabilities.
Karpathy says that after years of research, he believes it can be achieved by treating the challenge as a supervised learning problem. Engineers testing the technology found they could drive through sparsely populated areas without intervention, Karpathy said, but “they definitely struggle a lot more in very harsh environments like San Francisco.” In order for the system to really work well and mitigate the need for things like high-definition maps and additional sensors, you will have to improve a lot in handling densely populated areas.
One of the Tesla AI team’s game changes has been automatic tagging, through which it can automatically tag things like road hazards and other objects from millions of videos captured by vehicles on the Tesla camera. Large AI data sets have often required a lot of manual labeling, which is time consuming, especially when it comes to coming up with the kind of clearly labeled data set necessary for a supervised learning system in a neural network to work well.
With this latest supercomputer, Tesla has amassed 1 million videos of around 10 seconds each and tagged 6 billion objects with depth, velocity, and acceleration. All of this takes up a whopping 1.5 petabytes of storage. That sounds like a huge amount, but it will take much longer before the company can achieve the kind of reliability it requires from an automated driving system that relies only on vision systems, hence the need to continue to develop increasingly powerful supercomputers. in Tesla’s quest for more advanced AI.