Tutorial: Adding full repo context, pdfs and other docs

flight505 · December 10, 2024, 10:24pm

I wanted to start this post in hopes of encouraging other to share their workflows for incorporating docs and other forms of information into Cursor.
I find Github Gists one of the easiest ways to keep things a bit organized as I reuse variations of .cursorrules and .txt files.

Enhancing Cursor with additional documentation can streamline your workflow. Here’s a guide to incorporating PDFs and GitHub repository content into Cursor:

1. Adding PDFs to Cursor

To integrate PDF content into Cursor, convert the PDF into a text format that Cursor can index:

Convert the PDF: Use tools like Marker to convert your PDF into Markdown. For academic papers, enable full OCR and extract tables, while skipping figures and photos.
Create a GitHub Gist: Paste the converted Markdown into a new public Gist on GitHub. After saving, use the “clone as https” option to copy the link.
Add to Cursor: In Cursor, use the @Docs > Add New Doc feature to add the Gist link. Assign a name, set the entry point and prefix, and index the document.
Utilize in Cursor: Access the added document in your prompts using @Doc followed by the name you assigned. Test it ask it about a very specific section, depending on you task it is a good idea to get an understanding of how well Cursor can use the newly created @Docs for example can it summarize or extrapolate some meaning.

Optional: If you use Raycast, consider installing the GitHub Gist extension for easier access and searching your Gists.

2. Incorporating GitHub Repository Content

To add context from a GitHub repository, such as README files and code examples:

Extract Repository Content: Tools like uithub.com allow you to extract specific files or folders from a repository. For example, to retrieve only Markdown files, append ?ext=md to the repository URL.
Create a Consolidated Gist: Combine the extracted content into a single Markdown file. Ensure the total size is manageable (ideally under 60,000 tokens) to facilitate indexing. After saving as PUBLIC, use the “clone as https” option to copy the link.
Add to Cursor: As with PDFs, add the Gist link to Cursor using the @Docs > Add New Doc feature, set the appropriate parameters, and index the document.
Utilize in Cursor: Reference the document in your prompts using @Doc followed by the assigned name.

By following these steps, you can effectively integrate external documents and repository content into Cursor, enhancing its utility in your development workflow.

Let say you are building a local KG RAG and you want to use Lightrag. To integrate the LightRAG repository’s documentation and code examples into Cursor, follow these steps:

1. Extract Repository Content

Using uithub.com: Navigate to uithub.com and enter the LightRAG repository URL: https://github.com/HKUDS/LightRAG.
Filter for Markdown Files: To retrieve only Markdown files, append ?ext=md to the URL: https://uithub.com/HKUDS/LightRAG?ext=md.
Include Specific Folders: To include files from the examples directory, append ?/tree/main/examples to the URL: https://uithub.com/HKUDS/LightRAG?/tree/main/examples.

2. Create a Consolidated GitHub Gist

Combine Extracted Content: Merge the extracted README and example files into a single Markdown document.
Manage Token Size: Ensure the combined content is around 50000 tokens to facilitate successful indexing in Cursor (I often find indexing fails for some reasons, please inform me if you know of Cursors index limits, or other reasons for the fails). It would be helpful if Cursor provided information about the inner workings of the @Docs and indexing process in their own documentation.
Create a Public Gist: On GitHub, create a new public Gist named LightRAG_docs. paste the consolidated content, save it, and save as public, use the “clone as https” option to copy the link

3. Add the Gist to Cursor

In Cursor: Use the @Docs > Add New Doc feature.
Provide Details:
- Name: Enter LightRAG_docs.
- Entrypoint and Prefix: Set these as needed for your indexing structure (same in this case).
- Link: Paste the Gist link.
Index the Document: Complete the process to make the document searchable within Cursor.

4. Utilize the Document in Cursor

Reference in Prompts: As standard In your prompts, use @Docs followed by the document name to access the content, e.g., @Docs LightRAG_docs.

so the basics are write a script to convert the needed files to txt or md → create Gist or otherwise host the file so you can add it to Cursor. Tools like uithub.com is just a nice option, please post similar sites/tools.

Other options are just writing a basic script, download the repo, convert, host, add to Cursor…


import os
import sys


def main():
    args = sys.argv[1:]
    if len(args) < 2:
        print("Usage: python combine_files.py output_file input_file1 input_file2 ...")
        return

    output_filename = args[0]
    input_filenames = args[1:]

    output_data = ""

    for fname in input_filenames:
        try:
            file_path = os.path.join(os.getcwd(), fname)
            with open(file_path, "r", encoding="utf-8") as f:
                data = f.read()
                output_data += f"\n===== Filename: {fname} =====\n\n"
                output_data += data
                output_data += "\n\n"
        except Exception as e:
            print(f"Error reading {fname}: {e}")

    with open(output_filename, "w", encoding="utf-8") as f:
        f.write(output_data)
        print(f"Combined content written to {output_filename}")


if __name__ == "__main__":
    main()

flight505 · December 13, 2024, 7:39pm

If anyone uses Streamlit for prototype applications, dashboards, testing ideas, etc., you might have had issues with the outdated docs in Cursor @docs, as with many other docs. Pydantic AI can also extract and rewrite documentation through the same process as discussed above. I added this Streamlit doc for the December release 1.41, and it seems to work very well on my end. It is a shorter version based on the release updates and quick reference.

Here is a link → to the Gist

Or see it here; please comment on it if it works for you. If you have suggestions I should add in the Pydantic AI script, FYI it is intended to be less than 30,000 tokens.

Click to expand the Streamlit 1.41 docs

Streamlit API Cheat Sheet

This is a comprehensive summary of the docs for the latest version of Streamlit, v1.41.0.

Release Notes for Streamlit v1.41.0

Streamlit v1.41.0 introduces several new features, enhancements, and bug fixes to improve the developer experience and expand the capabilities of your Streamlit applications.

New Features

Native Support for Async Functions: Streamlit now natively supports asynchronous functions, allowing for more efficient handling of I/O-bound operations.

import streamlit as st
import asyncio

async def fetch_data():
    await asyncio.sleep(1)
    return "Data fetched!"

async def main():
    data = await fetch_data()
    st.write(data)

asyncio.run(main())

Enhanced Widget Customization: New parameters for widgets enable deeper customization, including theming and styling options.

st.button("Submit", key="submit_btn", disabled=False, type="tertiary", style={"color": "white", "backgroundColor": "#4CAF50"})

Improved Caching Mechanism: The caching system now supports versioning and provides more granular control over cache invalidation.
```
@st.cache_data(version=2)
def load_data(path):
    return pd.read_csv(path)
```

Expanded Media Support: Additional media formats are now supported, including SVG images and WebM videos.

from pathlib import Path

image_path = Path("./diagram.svg")
video_path = Path("./animation.webm")

st.image(image_path)
st.video(video_path)

Multi-Page App Enhancements: Improved navigation and state management across multiple pages in a Streamlit app.
```
st.set_page_config(page_title="My App", layout="wide", theme={"primaryColor": "#F63366"})
```

Pathlib Support: Streamlit supports pathlib.Path objects everywhere you can use a string path.

from pathlib import Path

data_path = Path("data/my_data.csv")
st.dataframe(pd.read_csv(data_path))

Date and Time Inputs Accept ISO Strings: [st.date_input] and [st.time_input] accept ISO formatted strings for initial values.
```
st.date_input("Select a date", value="2024-12-13")
st.time_input("Select a time", value="14:30:00")
```

Async Generators in st.write_stream: [st.write_stream] accepts async generators, which it converts internally to sync generators.

import streamlit as st
import asyncio

async def async_generator():
    for i in range(5):
        await asyncio.sleep(1)
        yield f"Message {i}"

async def main():
    async for message in async_generator():
        st.write_stream(message)

asyncio.run(main())

Enhancements

Performance Optimizations: Reduced load times and improved rendering performance for large datasets and complex layouts.
Accessibility Improvements: Enhanced support for screen readers and keyboard navigation to make apps more accessible.
Better Error Messages: More informative and user-friendly error messages to aid in debugging.
Theming Enhancements: Text and background color in Markdown can use the “primary” color from the theme.primaryColor configuration option.

Bug Fixes

Fixed issues related to widget state persistence across reruns.
Resolved compatibility problems with Python 3.13 and dropped support for Python 3.8.
Addressed layout inconsistencies in the sidebar and main content areas.
Multiple other bug fixes as detailed in the release notes.

Installation & Import
Command Line Interface
Magic Commands
Display Text
Display Data
Display Media
Display Charts
Sidebar Elements
Layout Management
Tabs
Expandable Containers
Control Flow
Interactive Widgets
Chat-Based Apps
Data Mutation
Display Code
Placeholders, Help, and Options
Data Source Connections
Performance Optimization
Progress and Status Indicators
User Personalization
Advanced Features

Install & Import

Install Streamlit

pip install streamlit

Run Your First App

streamlit run first_app.py

Import Convention

import streamlit as st

Pre-release Features

pip uninstall streamlit
pip install streamlit-nightly --upgrade

Learn more about experimental features

Command Line

Streamlit CLI Commands

streamlit --help                # Show all commands
streamlit run your_script.py    # Run a Streamlit app
streamlit hello                 # Launch the Streamlit hello app
streamlit config show           # Show current config
streamlit cache clear           # Clear cached data
streamlit docs                  # Open Streamlit documentation
streamlit --version             # Show Streamlit version

Magic Commands

Magic commands implicitly call st.write(). These commands allow you to write Markdown, display variables, and more without explicitly calling st.write().

# Implicitly calls st.write()
"_This_ is some **Markdown**"
my_variable
"dataframe:", my_data_frame

Magic Commands Enhancements

# Display dynamic markdown with variables
name = "Alice"
st.markdown(f"Hello, **{name}**!")

# Conditional rendering
if condition:
    "Condition is True"
else:
    "Condition is False"

Display Text

Streamlit offers various functions to display different types of text and formatted content:

st.write("Most objects")  # Display various objects
st.write(["st", "is <", 3])  # Display lists
st.write_stream(my_generator)  # Stream data
st.write_stream(my_llm_stream)  # Stream from language models

st.text("Fixed width text")  # Display fixed-width text
st.markdown("_Markdown_", style={"color": st.theme.primaryColor})  # Render Markdown with primary color
st.latex(r""" e^{i\pi} + 1 = 0 """)  # Render LaTeX

st.title("My Title")  # Large header
st.header("My Header")  # Medium header
st.subheader("My Sub")  # Smaller header

st.code("for i in range(8): foo()")  # Display code snippets
st.html("<p>Hi!</p>")  # Render raw HTML

Display Data

st.dataframe

Display a dataframe as an interactive table. This command works with a wide variety of collection-like and dataframe-like object types.

st.dataframe(my_dataframe)  # Interactive DataFrame, now supports more dataframe formats including pathlib.Path

Function Signature

st.dataframe(
    data=None,
    width=None,
    height=None,
    *,
    use_container_width=False,
    hide_index=None,
    column_order=None,
    column_config=None,
    key=None,
    on_select="ignore",
    selection_mode="multi-row"
)

Parameters

data (dataframe-like, collection-like, or None): The data to display. Supports pandas, Polars, Snowflake, and more. If data is None, an empty table is rendered.
width (int or None): Desired width in pixels. If None, fits contents up to the parent container’s width.
height (int or None): Desired height in pixels. Defaults to showing at most ten rows with vertical scrolling.
use_container_width (bool): If True, overrides width to match the parent container’s width.
hide_index (bool or None): If True, hides the index column(s). Automatically determined if None.
column_order (Iterable of str or None): Ordered list of columns to display. None displays all columns in their original order.
column_config (dict or None): Customizes column display, such as names, visibility, types, widths, and formats. Use _index to configure index columns.
key (str): Unique identifier for the dataframe in Session State.
on_select (“ignore” or “rerun” or callable): Defines response to user selection events.
- "ignore": No interaction.
- "rerun": App reruns upon selection.
- callable: Executes a callback before rerun.
selection_mode (“single-row”, “multi-row”, “single-column”, “multi-column”, or Iterable): Types of selections allowed.

Returns

element or dict: Returns an internal placeholder for adding rows if on_select="ignore". Otherwise, returns a dictionary-like object with selection data.

Examples

# Basic usage
st.dataframe(pd.DataFrame({
    'A': [1, 2, 3],
    'B': ['a', 'b', 'c']
}))

# Customizing columns
st.dataframe(
    df,
    column_order=["B", "A"],
    column_config={
        "A": st.column_config.NumberColumn("Numbers", format="$%d"),
        "B": st.column_config.StringColumn("Letters")
    }
)

# Handling selections
def handle_selection(selection):
    st.write("Selected:", selection)

st.dataframe(
    df,
    on_select=handle_selection,
    selection_mode="multi-row"
)

st.form

Create a form that batches elements together with a “Submit” button. Forms are containers that group widgets and contain a Submit button. When submitted, all widget values inside the form are sent to Streamlit in a batch.

with st.form(key="my_form"):
    username = st.text_input("Username")
    password = st.text_input("Password", type="password")
    st.form_submit_button("Login")

Function Signature

st.form(
    key,
    clear_on_submit=False,
    *,
    enter_to_submit=True,
    border=True
)

Parameters

key (str): Unique identifier for the form.
clear_on_submit (bool): If True, resets widgets to default values after submission. Defaults to False.
enter_to_submit (bool): If True, pressing Enter submits the form. Defaults to True.
border (bool): If True, shows a border around the form. Defaults to True.

Constraints

Every form must contain a st.form_submit_button.
st.button and st.download_button cannot be added to a form.
Forms cannot be nested within other forms.
Only st.form_submit_button can have a callback within a form.

Examples

# Basic form
with st.form(key="login_form"):
    username = st.text_input("Username")
    password = st.text_input("Password", type="password")
    submit = st.form_submit_button("Login")
    if submit:
        st.write(f"Welcome, {username}!")

# Form with clearing on submit
with st.form(key="search_form", clear_on_submit=True):
    query = st.text_input("Search Query")
    submit = st.form_submit_button("Search")
    if submit:
        results = perform_search(query)
        st.write(results)

# Form with custom submission behavior
def handle_submit():
    st.write("Form submitted!")

with st.form(key="custom_form"):
    data = st.text_area("Enter data")
    submit = st.form_submit_button("Submit", on_click=handle_submit)

Display Media

from pathlib import Path

image_path = Path("./header.png")
audio_path = Path("./audio.mp3")
video_path = Path("./video.webm")

st.image(image_path)  # Display an image, supports pathlib.Path
st.audio(audio_path)  # Play audio data, supports autoplay and correct time zones
st.video(video_path, autoplay=True, muted=True)  # Play video data with autoplay and mute
st.logo("logo.jpg", size=(100, 100))  # Display a logo image with adjustable size

Display Charts

# Built-in charts
st.area_chart(df, use_container_width=True, height=400)  # Area chart with container width and height
st.bar_chart(df, horizontal=True, border=True)  # Bar chart horizontal with optional border
st.line_chart(df, use_container_width=True)  # Line chart with tooltips on hover
st.map(df)  # Geospatial data, now supports freezing columns with configuration
st.scatter_chart(df)

# External libraries
st.altair_chart(chart)
st.bokeh_chart(fig)
st.graphviz_chart(fig)
st.plotly_chart(fig, config={"displayModeBar": False})  # Plotly charts with tooltips
st.pydeck_chart(chart, height=600)
st.pyplot(fig)  # Matplotlib figures
st.vega_lite_chart(df, spec)

# Interactive charts with user selections
event = st.plotly_chart(df, on_select="rerun")
event = st.altair_chart(chart, on_select="rerun")
event = st.vega_lite_chart(df, spec, on_select="rerun")

Add Elements to Sidebar

# Directly add to sidebar
a = st.sidebar.radio("Select one:", [1, 2])

# Using "with" notation
with st.sidebar:
    st.radio("Select one:", [1, 2])

Columns

# Two equal columns with optional border
col1, col2 = st.columns(2)
col1.write("This is column 1")
col2.write("This is column 2")

# Three columns with different widths and optional border
col1, col2, col3 = st.columns([3, 1, 1], border=True)  # col1 is larger

# Bottom-aligned columns
col1, col2 = st.columns(2, vertical_alignment="bottom")

# Using "with" notation
with col1:
    st.radio("Select one:", [1, 2])

# Freezing columns with column configuration
st.column_config(col1, frozen=True)

Tabs

# Create tabs
tab1, tab2 = st.tabs(["Tab 1", "Tab2"])
tab1.write("This is tab 1")
tab2.write("This is tab 2")

# Using "with" notation
with tab1:
    st.radio("Select one:", [1, 2])

Expandable Containers

expand = st.expander("My label", icon=":material/info:")
expand.write("Inside the expander.")
pop = st.popover("Button label")
pop.checkbox("Show all")

# Using "with" notation
with expand:
    st.radio("Select one:", [1, 2])

Control Flow

# Stop execution immediately
st.stop()

# Rerun script immediately
st.rerun()

# Navigate to another page
st.switch_page("pages/my_page.py")

# Define a navigation widget in your entrypoint file
pg = st.navigation(
    st.Page("page1.py", title="Home", url_path="home", default=True),
    st.Page("page2.py", title="Preferences", url_path="settings")
)
pg.run()

# Group multiple widgets in a form
with st.form(key="my_form"):
    username = st.text_input("Username")
    password = st.text_input("Password", type="password")
    st.form_submit_button("Login")

# Define a dialog function
@st.dialog("Welcome!")
def modal_dialog():
    st.write("Hello")

modal_dialog()

# Define a fragment for reusable UI components
@st.fragment
def fragment_function():
    df = get_data()
    st.line_chart(df)
    st.button("Update")

fragment_function()

# Using st.write_stream with async generators
import asyncio

async def async_gen():
    for i in range(5):
        await asyncio.sleep(1)
        yield f"Message {i}"

st.write_stream(async_gen())

Display Interactive Widgets

# Buttons and Actions
st.button("Click me", type="tertiary")
st.download_button("Download file", data)
st.link_button("Go to gallery", url)
st.page_link("app.py", label="Home", icon=":material/home:")

# Data Editing
st.data_editor("Edit data", data, column_config={"frozen": True})

# Selection Widgets
st.checkbox("I agree")
st.feedback("thumbs")
st.pills("Tags", ["Sports", "Politics"])
st.radio("Pick one", ["cats", "dogs"])
st.segmented_control("Filter", ["Open", "Closed"])
st.toggle("Enable")
st.selectbox("Pick one", ["cats", "dogs"])
st.multiselect("Buy", ["milk", "apples", "potatoes"])

# Input Widgets
st.slider("Pick a number", 0, 100)
st.select_slider("Pick a size", ["S", "M", "L"])
st.text_input("First name")
st.number_input("Pick a number", 0, 10)
st.text_area("Text to translate")
st.date_input("Your birthday", value="1990-01-01")
st.time_input("Meeting time", value="09:00:00")
st.file_uploader("Upload a CSV", type=["csv"])
st.audio_input("Record a voice message")
st.camera_input("Take a picture")
st.color_picker("Pick a color")

# Using widget values in variables
for i in range(int(st.number_input("Num:"))):
    foo()
if st.sidebar.selectbox("I:", ["f"]) == "f":
    b()
my_slider_val = st.slider("Quinn Mallory", 1, 88)
st.write(my_slider_val)

# Disable widgets to remove interactivity
st.slider("Pick a number", 0, 100, disabled=True)

Build Chat-Based Apps

# Insert a chat message container
with st.chat_message("user"):
    st.write("Hello 👋")
    st.line_chart(np.random.randn(30, 3))

# Display a chat input widget at the bottom of the app
st.chat_input("Say something")

# Display a chat input widget inline
with st.container():
    st.chat_input("Say something")

Learn how to Build a basic LLM chat app

Mutate Data

# Add rows to a dataframe after showing it
element = st.dataframe(df1)
element.add_rows(df2)

# Add rows to a chart after showing it
element = st.line_chart(df1)
element.add_rows(df2)

Display Code

with st.echo():
    st.write("Code will be executed and printed")
    import pandas as pd
    df = pd.DataFrame({"A": [1, 2, 3]})
    st.write(df)

Placeholders, Help, and Options

# Replace any single element
element = st.empty()
element.line_chart(...)
element.text_input(...)  # Replaces the previous element

# Insert out of order using containers
elements = st.container()
elements.line_chart(...)
st.write("Hello")
elements.text_input(...)  # Appears above "Hello"

# Display help and access options
st.help(pandas.DataFrame)  # Display help for pandas DataFrame
st.get_option(key)  # Get a Streamlit config option
st.set_option(key, value)  # Set a Streamlit config option

# Configure page settings
st.set_page_config(layout="wide", page_title="My App", page_icon=":smile:", theme={
    "primaryColor": "#F63366",
    "backgroundColor": "#FFFFFF",
    "secondaryBackgroundColor": "#F0F2F6",
    "textColor": "#262730",
    "font": "sans serif"
})

# Manage query parameters
st.query_params[key]
st.query_params.from_dict(params_dict)
st.query_params.get_all(key)
st.query_params.clear()

# Render raw HTML
st.html("<p>Hi!</p>")

Connect to Data Sources

# Define a connection to a SQL database
st.connection("pets_db", type="sql")
conn = st.connection("sql")
conn = st.connection("snowflake")

# Custom connection class
class MyConnection(BaseConnection[myconn.MyConnection]):
    def _connect(self, **kwargs) -> MyConnection:
        return myconn.connect(**self._secrets, **kwargs)

    def query(self, query):
        return self._instance.query(query)

# Using the connection
my_conn = MyConnection()
result = my_conn.query("SELECT * FROM pets")
st.write(result)

Optimize Performance

Cache Data Objects

@st.cache_data
def foo(bar):
    # Perform expensive computation or data retrieval
    return data

# Execute foo
d1 = foo(ref1)
# Retrieve cached result
d2 = foo(ref1)  # d1 == d2

# Different argument triggers recomputation
d3 = foo(ref2)

# Clear specific cache entry
foo.clear(ref1)

# Clear all cache entries for the function
foo.clear()

# Clear all cached data
st.cache_data.clear()

Cache Global Resources

@st.cache_resource
def foo(bar):
    # Create and return a resource
    return session

# Execute foo
s1 = foo(ref1)
# Retrieve cached resource
s2 = foo(ref1)  # s1 == s2

# Different argument triggers recomputation
s3 = foo(ref2)

# Clear specific cache entry
foo.clear(ref1)

# Clear all cache entries for the function
foo.clear()

# Clear all cached resources
st.cache_resource.clear()

Display Progress and Status

import time

# Show a spinner during a process
with st.spinner(text="In progress"):
    time.sleep(3)
    st.success("Done")

# Show and update a progress bar
bar = st.progress(50)
time.sleep(3)
bar.progress(100)

# Show and update a status message
with st.status("Authenticating...") as s:
    time.sleep(2)
    st.write("Some long response.")
    s.update(label="Response")

# Visual effects
st.balloons()  # Display balloons animation
st.snow()      # Display snow animation
st.toast("Warming up...")  # Show a toast message

# Display different types of messages
st.error("Error message")
st.warning("Warning message")
st.info("Info message")
st.success("Success message")
st.exception(e)  # Display exception details

Personalize Apps for Users

# Show different content based on the user's email address
if st.experimental_user.email == "[email protected]":
    display_jane_content()
elif st.experimental_user.email == "[email protected]":
    display_adam_content()
else:
    st.write("Please contact us to get access!")

# Access cookies and headers
cookies = st.context.cookies
headers = st.context.headers

Advanced Features

Magic Commands Enhancements

# Display dynamic markdown with variables
name = "Alice"
st.markdown(f"Hello, **{name}**!")

# Conditional rendering
if condition:
    "Condition is True"
else:
    "Condition is False"

Theming and Styling

# Set theme in config
st.set_page_config(
    page_title="Themed App",
    layout="centered",
    initial_sidebar_state="expanded",
    theme={
        "primaryColor": "#F63366",
        "backgroundColor": "#FFFFFF",
        "secondaryBackgroundColor": "#F0F2F6",
        "textColor": "#262730",
        "font": "sans serif"
    }
)

# Inject custom CSS
st.markdown(
    """
    <style>
    .big-font {
        font-size:50px !important;
    }
    </style>
    """,
    unsafe_allow_html=True
)

st.markdown('<p class="big-font">This is a big font text</p>', unsafe_allow_html=True)

Session State Management

# Initialize session state
if 'count' not in st.session_state:
    st.session_state.count = 0

# Increment counter
def increment():
    st.session_state.count += 1

st.button("Increment", on_click=increment)
st.write(f"Count: {st.session_state.count}")

# Using session state in widgets
selected_option = st.selectbox("Choose", ["Option 1", "Option 2"], key="select")
st.write(f"Selected: {st.session_state.select}")

Custom Components

import streamlit.components.v1 as components

# Define a custom component
my_component = components.declare_component("my_component", path="frontend/build")

# Use the custom component
output = my_component(key="unique_key", some_prop="value")
st.write(output)

Internationalization (i18n)

from gettext import gettext as _

# Set language
language = st.selectbox("Select language", ["en", "es", "fr"])

# Load translations
translations = {
    "en": {"greet": "Hello"},
    "es": {"greet": "Hola"},
    "fr": {"greet": "Bonjour"},
}

st.write(translations[language]["greet"])

Additional Resources

Tips & Best Practices

Use Caching Wisely: Cache data and resources to improve performance but be mindful of cache invalidation to ensure data freshness.
Optimize Layouts: Utilize columns, tabs, and sidebars to create intuitive and organized interfaces.
Leverage Session State: Maintain user interactions and data across app reruns using st.session_state.
Enhance Accessibility: Ensure your app is accessible by following accessibility best practices and utilizing Streamlit’s accessibility features.
Secure Sensitive Data: Handle user data and secrets securely using Streamlit’s secrets management.

Troubleshooting

App Not Updating: Ensure that widgets are correctly placed and that caching isn’t preventing updates. Use st.experimental_rerun() if necessary.
Performance Issues: Profile your app to identify bottlenecks. Optimize data processing and leverage caching where appropriate.
Widget State Loss: Use unique keys for widgets to maintain state across reruns and interactions.
Layout Problems: Use Streamlit’s layout primitives like columns and containers to manage complex layouts effectively.

Useful Links

simjak · December 15, 2024, 9:19am

Great workflow, as alternative for ingesting repo content you can try an OSS service https://gitingest.com/

flight505 · December 15, 2024, 3:09pm

Very nice! I guess there are several tools similar to Uithub and Gitingest; they are just not that easy to find and not really mentioned on the Cursor forum. The forum is getting a bit confusing and crowded with single posts asking for model integration each week when a new model is released or complaints about Cursor. Cursor should hire a few moderators to keep useful posts organized and visible.

Thanks for sharing useful information

dbsx · December 17, 2024, 1:12pm

I guess the entry point and the prefix link should be the same as the gist link since there is only one file.

Also, I guess the private gist is not supported as a document, which would be more user-friendly if it were.

flight505 · December 17, 2024, 2:02pm

When using a single PDF as a gist, you use the entry point and the prefix link for the link you create. Cursor needs public gists for this workflow, and you should use the “clone as https” option to copy the link.

You can also use the agent tools now and create a .txt file containing the PDFs, adding it to your project repo.

It is easy and fast to just use the index for docs and repo ingest, as Cursor can index it directly. Using the Agent and writing instructions will be the way to do it in the upcoming Cursor versions. However, you can already write and run parallel and sequential agentic tasks. There are some interesting examples out there in the void, pushing the capabilities of Cursor.

dbsx · December 18, 2024, 2:06am

Thank you for your feedback. I understand that if there is a single data source (like one PDF or repository), we should use the same link for the prefix and entry points. However, suppose there are multiple data sources (repositories, PDFs, or different file types). In that case, we should organize them as separate files in the gist, with the entry point linking to one specific file and the prefix pointing one level up in the directory structure. Is this correct?

flight505 · December 18, 2024, 2:24pm

If you plan to utilize information from PDFs, including tables, text, and mathematical formulas, you can consolidate this data into a single .md file. By adding this file to a GitHub Gist and indexing it in Cursor, you can test Cursor’s ability to reference and infer from the content. Using tools like Marker-PDF can assist in extracting tables and mathematical formulas accurately.

This is based on information from cursor forums and docs , it might not be accurate. But you might have to change the settings a bit.

Method of Adding and Indexing Documentation:

When you open a folder in Cursor, it scans the directory and computes a Merkle tree of hashes for all files, excluding those specified in .gitignore or .cursorignore. The Merkle tree is then synchronized with Cursor’s servers. Every 10 minutes, Cursor checks for hash mismatches to identify changed files and uploads only those. On the server, files are chunked and embedded, with embeddings stored in a vector database. Each vector is associated with an obfuscated relative file path and corresponding line range.

Local vs. Remote Storage:

While Cursor performs local scanning and hashing of your codebase, the embeddings and some metadata are processed and stored on Cursor’s servers. This approach allows for efficient indexing and retrieval but involves cloud storage of certain data. However, Cursor offers a “Privacy Mode” that ensures your code is only stored locally, enhancing security and privacy.

Testing Cursor’s Inference Capabilities:

To test Cursor’s ability to reference and utilize the extracted information, you can:

Consolidate Extracted Data: Use Marker to extract the necessary tables and formulas from your PDFs and convert them into a md file.
Add to Your Project: Incorporate the Markdown file into your project’s directory within Cursor.
Interact with the AI: Use Cursor’s AI features to query or reference the information contained in the Markdown file. For instance, you can ask the AI to explain a particular formula or utilize a table’s data within your code.
Evaluate Responses: Assess how accurately and effectively Cursor’s AI references and integrates the extracted information into its suggestions or code generation.

By following these steps, you can determine how well Cursor can leverage external data extracted from PDFs in your coding workflow.

You can also try using the .cursorrules pin the docs that way and use chat.
Or: Cursor supports the integration of custom AI agents to enhance its functionality. By defining an OpenAI agent tailored to your needs, you can extend Cursor’s capabilities to perform specific tasks or provide specialized assistance. This involves creating configuration files that define the agent’s behavior and adding them to your project within Cursor. Implementing custom agents require experimentation to achieve the desired outcomes.

flight505 · January 15, 2025, 3:25pm

https://gist.github.com/flight505/11cd7d79e1133e77f17a85ca92db26c0

howardx · January 19, 2025, 5:06am

regarding adding public git repos in context, why not open cursor at a parent folder, clone the repo you are interested in locally, then use cursor’s @Files or @Code and pick that entire cloned folder ? Curious what is the trade-off between these two approaches ?

I guess the same goes for PDF, if we extract the content and dump it into a local file, then use @File, would it be better or worse ?

Thank you.

clearloop · February 10, 2025, 6:02pm

great app! however it does not generate static files that cursor still can not index repo content with it XD…

just wrote https://gitcursor.vercel.app/ to solve this problem, feel free to try it out!

T1000 · February 10, 2025, 6:10pm

you are aware that @Docs indexes the content for ALL users on Cursor servers? why waste so much computing power and storage to expose 3rd party data thats unrelated to most users? otherwise. not sure why overcomplicate things. you can refer to any url with @anyurlyouneed…

techup · March 29, 2025, 8:59am

Really really nice project! You just saved my day and me from wasting lots of time in creating something similar. Would you mind to share the source or add some features? I would like to be able to filter the sub-files folders to reduce the number of files cursor needs to index. To allow cursor to still index I guess they need to be added as an encrypted string “folder” to the URL - before the actual user/repo.

in my case I want to limit it to *.node.ts files only.

Btw no idea what T1000 is talking about or who he is referring to.

flight505 · April 2, 2025, 9:36pm

You can check out this tool I made. Its an app that helps you with work on cursor projects and co lab with Claude desktop and code.

ContextCraft: Code Context Extraction Tool

ContextCraft is an Electron application designed to extract context from your codebase for use with models like Claude or others, especially if you are using O1. Think of it like “Repo Mix” but with a user-friendly interface!

Features

Context Extraction: Easily pull relevant code snippets for your AI models.
O1 Optimization: Tailored for use with O1 projects.
Code Compression: Reduce the size of your codebase context.
Multi-Repo Learning: Select specific parts from other repositories to learn from and integrate into your existing code.
User Interface: A graphical interface makes context extraction a breeze.

Download

You can download ContextCraft from GitHub: https://github.com/flight505/ContextCraft

Inspiration

This project is based on a video by Kevin Leneway. Check out his channel here: https://www.youtube.com/@kevinleneway2290

benjaminfortunato · April 7, 2025, 1:54pm

I’m wondering if there is a good source for documentation that is web based and not available on github. Most docs are available on github but sometimes I come across documentation that provides a github readme and then refers to an external url.

I thought there there might be a site that already turns doc sites into a downloadable markdown file?

benjaminfortunato · April 7, 2025, 3:36pm

I noted that the https://uithub.com/serverless/serverless/tree/main/docs and https://gitingest.com/ both produce one mega file. I wrote a quick script to divide this up into smaller markdown files. The thinking was that i can’t send 170 tokens to the ai and expect it to figure things out but maybe cursor is indexing these files on the backend? I’m using the cursor rules and then a @file to refer to specific markdown files. With this modular structure it does mean I need to be more explict about what I include.

Anyone know what the best way to do this it, a monolithic file or a bunch of smaller files/

benjaminfortunato · April 7, 2025, 4:20pm

I noticed that the Docs upload is pretty good at working with github documentation. It seems that to get documentation from other sites you need to crawl the site with a custom script. I’m wondering if the githib markdown are better than using cursor’s indexing adn the @docs feature vs the actual @Files and cursor rules?

Topic		Replies	Views
Add local PDF via @Docs Feature Requests	34	11788	April 29, 2025
Importing a library from a GitHub repository How To	3	977	January 21, 2025
Uithub For @docs Discussions	1	121	December 9, 2024
Docs with images Discussions	2	351	February 19, 2025
Suggestion: special handling for GitHub links in @docs Feature Requests	0	49	January 29, 2025

Tutorial: Adding full repo context, pdfs and other docs

Streamlit API Cheat Sheet

Release Notes for Streamlit v1.41.0

New Features

Enhancements

Bug Fixes

Table of Contents

Install & Import

Install Streamlit

Run Your First App

Import Convention

Pre-release Features

Command Line

Streamlit CLI Commands

Magic Commands

Magic Commands Enhancements

Display Text

Display Data

st.dataframe

st.form

Display Media

Display Charts

Add Elements to Sidebar

Columns

Tabs

Expandable Containers

Control Flow

Display Interactive Widgets

Build Chat-Based Apps

Mutate Data

Display Code

Placeholders, Help, and Options

Connect to Data Sources

Optimize Performance

Cache Data Objects

Cache Global Resources

Display Progress and Status

Personalize Apps for Users

Advanced Features

Magic Commands Enhancements

Theming and Styling

Session State Management

Custom Components

Internationalization (i18n)

Additional Resources

Tips & Best Practices

Troubleshooting

Useful Links

ContextCraft: Code Context Extraction Tool

Features

Download

Inspiration

Related topics