Kirill Solodskih, PhD, is the Co-Founder and CEO of TheStage AI. With over a decade of expertise in AI research and entrepreneurship, he has a rich background in optimizing neural networks for real-world business applications. In 2024, he co-founded TheStage AI, securing $4.5 million in funding to automate neural network acceleration across various hardware platforms.
Before founding TheStage AI, Kirill served as a Team Lead at Huawei, where he advanced AI camera applications for Qualcomm NPUs, enhancing the performance of the P50 and P60 smartphones. His contributions earned several patents. His research has received accolades at top conferences like CVPR and ECCV. Kirill also hosts a podcast on AI optimization and inference.
What inspired you to co-found TheStage AI, and how did you transition from academia and research to tackling inference optimization as a startup founder?
TheStage AI’s roots lie in my work at Huawei, focusing on automating deployments and optimizing neural networks. These experiences highlighted the real-world challenges of model deployment—transforming models from training phase to real-world usability presents significant hurdles. It’s crucial to minimize neural network parameters without sacrificing performance, which remains a challenging mathematical problem ripe with opportunities for innovation.
Manual inference optimization has long been a bottleneck in AI. Can you explain how TheStage AI automates this process and why it’s a game-changer?
TheStage AI addresses the inefficiencies of manual neural network compression and acceleration. Our Automated Neural Networks Analyzer (ANNA) identifies non-critical layers to optimize, akin to the automated process of ZIP file compression, thus speeding AI adoption and reducing costs.
TheStage AI claims to reduce inference costs by up to 5x — what makes your optimization technology so effective compared to traditional methods?
TheStage AI’s optimization strategy surpasses traditional methods by dissecting neural networks into smaller components and applying tailored algorithms for optimal compression. Our approach enhances scalability and model quality, integrating flexible compiler settings for hardware-specific optimization, such as with iPhones or NVIDIA GPUs.
How does TheStage AI’s inference acceleration compare to PyTorch’s native compiler, and what advantages does it offer AI developers?
Unlike PyTorch’s “just-in-time” compilation, which can delay deployment, TheStage AI allows models to be pre-compiled for immediate deployment. This accelerates rollouts, improves service efficiency, and reduces costs, making AI models more scalable and responsive.
Can you share more about TheStage AI’s QLIP toolkit and how it enhances model performance while maintaining quality?
QLIP is a versatile Python library that provides tools for building and implementing new optimization algorithms across different hardware. It allows AI engineers to quickly adapt and integrate cutting-edge research into their models, offering flexibility beyond traditional frameworks.
You’ve contributed to AI quantization frameworks used in Huawei’s P50 & P60 cameras. How did that experience shape your approach to AI optimization?
Working on AI quantization for Huawei’s devices taught me the power of automation in achieving speed without compromising quality. Automating the quantization process significantly reduced development time, allowing us to adapt quickly to new hardware.
Your research has been featured at CVPR and ECCV — what are some of the key breakthroughs in AI efficiency that you’re most proud of?
One key achievement was a paper selected for an oral presentation at CVPR 2023. It focused on the analysis and compression of neural networks, offering innovative algorithms and insights into neural network parameter requirements.
Can you explain how Integral Neural Networks (INNs) work and why they’re an important innovation in deep learning?
INNs improve traditional neural networks by allowing dynamic resizing based on resources, akin to compressing and expanding a blanket. This results in better compression and quality maintenance, offering practical results with minimal effort.
TheStage AI has worked on quantum annealing algorithms — how do you see quantum computing playing a role in AI optimization in the near future?
Quantum computing introduces new approaches to AI optimization, enabling precise problem-solving that traditional systems can’t match. While current architecture prevents direct neural network loading, quantum computing could revolutionize optimization through its inherent parallelism.
What is your long-term vision for TheStage AI? Where do you see inference optimization heading in the next 5-10 years?
TheStage AI aims to be a