Performance Issues with Meta Llama3 Model for Large Token Inputs

Hello Team,

I am currently using the meta.llama3-1-405b-instruct-v1:0 model from AWS Bedrock, and I have noticed significant slowdowns when dealing with large amounts of text. My application needs to process long pieces of text, and while the model gives accurate and detailed responses, it becomes much slower as the input size increases.

On average, the size of the text I’m working with is around 3000 tokens for the input, and the responses generated by the model are typically around 350 tokens.

Request for Assistance:

  1. Are there any recommended settings or adjustments I can make to the model to handle large inputs more quickly and efficiently?
  2. Can I tweak the model parameters in a way that improves speed without reducing the quality of the responses?
  3. Are there any updates or future improvements planned to make the model work better with large token inputs and responses?