Custom Models in Smart Window - Bring your own endpoint

Firefox Firefox Last updated: 1 week ago
No one has helped translate this article yet. If you already know how localizing for SUMO works, start translating now. If you want to learn how to translate articles for SUMO, please start here.
Note: The Smart Window feature will be gradually made available to the Firefox user base, starting with users in the United States and Canada in Firefox version 150.

Smart Window lets you connect your own AI model instead of using the ones provided by Firefox. This is helpful if you want more control, use a specific provider, or run a model locally on your device.

You can connect either:

  • A remote model (such as OpenRouter)
  • A local model running on your device (such as Lemonade Server or Ollama)
Note: If you use a custom model, Smart Window may not work as expected. This feature and these instructions are catered towards users who are familiar with these services and tools.

Use a remote model (OpenRouter)

  1. Create an OpenRouter account if you do not have one already, at https://openrouter.ai/.
  2. Generate an API Key in OpenRouter, and copy it to a secure place.
    • OpenRouter API keys begin with sk-or-v1-.
  3. Open the OpenRouter models page and choose a model you would like to use.
    • Take note of its model ID. Ex: z-ai/glm-4.5-air:free.
  4. In Firefox: In the Menu bar at the top of the screen, click Firefox and select Settings (or Preferences, in some cases).Click the menu button Fx89menuButton and select Settings.
  5. Go to AI Controls > Smart Window Settings > Assistant model.
  6. Select Custom: Use your own LLM.
  7. Fill in the fields:
    • Model name: Paste OpenRouter model ID, from step 3.
    • Model endpoint with the OpenRouter API endpoint, which is typically https://openrouter.ai/api/v1.
    • API key: Paste your OpenRouter API key from step 2.
  8. Click Save.
  9. Open a Smart Window, and start using the Assistant.
Tip: You can find free models on OpenRouter by searching for “free” on the models page (direct link).
bringyourownmodel

Use a local model

Example: Lemonade Server

  1. Download and install Lemonade Server at https://lemonade-server.ai/. You must use version 10.2.0 or newer.
  2. Run Lemonade Server and download a model of your choice using the app instructions.
  3. In a command line terminal, set a larger context size by using the command lemonade config set ctx_size=8192.
  4. Reload the model from the UI or by using the command lemonade unload (the next time you make a request to the model, it will load with your settings).
  5. In Firefox: In the Menu bar at the top of the screen, click Firefox and select Settings (or Preferences, in some cases).Click the menu button Fx89menuButton and select Settings.
  6. Go to AI Controls > Smart Window Settings > Assistant model.
  7. Select Custom: Use your own LLM.
  8. Fill in the fields:
    • Model name: Enter your model name from step 2 (for example, SmolLM3-3B-GGUF).
    • Model endpoint: Enter the Lemonade Server endpoint, which is typically http://localhost:13305/api/v1.
    • Note that no API key is required for Lemonade Server.
  9. Click Save.
  10. Open a Smart Window, and start using the Assistant.

Example: Ollama

  1. Download and install Ollama at https://ollama.com/download.
  2. Run Ollama, and follow the instructions on the site to download a local model of your choice.
  3. Open the Firefox settings screen, and go to AI Controls > Smart Window Settings > Assistant model, and select Custom: Use your own LLM.
  4. Fill in the fields:
    • Model name: Enter your model name from step 2 (ex: qwen3.5:4b).
    • Model endpoint: Enter the Ollama endpoint, which is typically http://localhost:11434/v1.
    • Note that no API key is required for Ollama.
  5. Click Save.
  6. Open a Smart Window, and start using the Assistant.

Related articles

These fine people helped write this article:

Illustration of hands

Volunteer

Grow and share your expertise with others. Answer questions and improve our knowledge base.

Learn More