Pregunta de entrevista de NVIDIA

Tell me how you can conserve GPU memory when running inference on LLMs.