Pregunta de entrevista de Google

How would you optimize a large- scale model for real-time inference while maintaining a strict latency budget