Do you consider supporting API of that Gemini 1.5 as soon as it’ll be possible? for large code base it seems better than GPT4 or the open source models (do you run a model similar by quality to GPT4 but it’s not the openai model behind the scenes, correct? something like mixtral/llama/all-that-stuff? I mean when we run in a mode without using our own openapi key for Cursor [was there a setting for this or i’m hallucinating lol] )
Will be surreal experience to code with the power of your embeddings feature plus the Gemini 1.5 navigation\reasoning across the content supplied to it by your current underlying tech (for example current AIs will find and summarize the ‘relevant’ information from the embeddings/docs/codebase, then feed it to Gemini to make the final decisions and reply to the prompt by refactoring code/bug fixing/whatever needed) what do you think of this idea?
from the three.js youtube video example it seems like the experience working with medium/large projects will become x10-x20 faster if the model is able to hold 1m token in memory during the process, and also if you’ll make dynamic arrangement of what’s in context using local lightweight models that index all classes/functions and know to assemble a “relevant current context” quickly from the IDE and send it to Gemini (or from the backend, if all the indexing/embedding stuff happens on backend as I understand [i’ve read just a bit of docs so not sure] ) that’ll make those 1m space into a “highly relevant 1m” which will be a “” experience. I saw people on twitter commenting that they’re in a hurry to buy google stock after seeing this aaahah good point, Google has quite a manpower/brainpower/datacenter-capacity to give openai a fight for a slice of that AI pie
we can use it in cursor? honestly I browsed the settings and didn’t find where to put openapi key or anything, I think I’m such a dummy that even the AI won’t help me to do what I want to do
As long as GPT 4 turbo in Cursor uses 10k context only, regardless of the model itself, I am not sure that with Gemini there will be any difference and considerable context…
in the future maybe the Cursor editor will allow to redirect queries into ‘any custom api endpoint’ for the LLM part of request, one day? I would probably run a local LLM, which will use the embeddings synced from my Cursor account into local folder, that way everything will work on my machine and will require less GPU processing from Cursor backends, but then their engine will only be responsible for scanning web pages and making embeddings, plus the vector DB where all that is stored? plus the workspace scanning and storing in the db… still a lot of things that the service will need to take care of, but less costs in terms of GPU for all the customers, since 20$/mo can burn rather quickly with a lot of usage from some customers so the company won’t be profitable in this case, local LLMs for ‘power users’ are a good solution
but how can I use it, in the settings there is only key for openai or azure, how did you use a model from openrouter and where do these models actually run?
I see that Claude 3 on openrouter shows up to 200k context window possible, but how can we control what exactly and how much is being sent to the LLMs from Cursor? in case with the docs and @codebase for example, if we send 200k each time, it’s easy to go bankrupt with that Claude pricing I’m afraid to try it lol. What system decides how much current prompt is needed for the LLM? (the context means ‘all chat history’ but if we refer to @docs and the backend pulls some embeddings and sends to LLM, we cannot know how much from the context it’ll utilize, correct? and long chats are quickly going to fill in the 200k, but what’s after that, oldest messages content gets forgotten automatically by the LLM APIs on their side?