The interview process focused a lot on machine learning and LLM-related topics. They asked about different ML algorithms, how a large language model can be served in a production setup like sccle, and how to deploy an LLM model either locally or on the cloud. They also went into how to handle multiple requests from the same user efficiently in a real-time system.
There were deep technical questions about past projects, and they expected detailed explanations around architecture, challenges faced, and optimization techniques used. They also asked about how a RAG (retrieval-augmented generation) system works and how it integrates with LLMs for better performance. A good understanding of optimization methods like quantization, distillation, and model serving strategies was expected.
In addition to the theory, there were two Python coding questions.
Overall, the interview was deep and technical, mostly focused on applied machine learning, LLMs, and production-level deployment.