Looks like a big pivot on target audience from developers to regular users, at least on the homepage https://ollama.com/ as a product. Before, it was all about the CLI versions of Ollama for devs, now it's not even mentioned. At the bottom of the blog post it says:
> For pure CLI versions of Ollama, standalone downloads are available on Ollama’s GitHub releases page.
Nothing against that, just an observation.
Previously I tested several local LLM apps, and the 2 best ones to me were LM Studio [1] and Msty [2]. Will check this one out for sure.
One missing feature that the ChatGPT desktop app has and I think is a good idea for these local LLM apps is a shortcut to open a new chat anytime (Alt + Space), with a reduced UI. It is great for quick questions.
Its very strange, but they do have a Linux client that they refuse to mention in their blog post. I have no idea if this is a simple slip-up or if it was for some reason intentional.
I just updated and a bit annoying by default gemma3:4b was selected that I don't have on my local. I guess would be nicer to default to one of the models that are present.
It was nice it started downloading it but also there was no indication I don't have that model before hand until I opened drop-down to see download buttons.
The need to start a chat for a model that is not currently downloaded in order to initiate a download confused me for a minute the first time I tried it out. A more intuitive approach (which is the first thing I tried to do before I figured it out) might be to make the download icons in the model list clickable, to initiate a download. Then you could display a download progress bar in the list, and when models have been downloaded show a little "info" icon that is also clickable to display the model card, surface other model specific options, and enable deletion. Love the new UI, kudos!
do you know why ollama hasn't updated its models in over a month while many fantastic models have been released in that time, most recently GLM 4.5? It's forcing me to use LM Studio which I for whatever reason absolutely do not prefer.
thank you guys for all your work on it, regardless
You know that if you go to hugging face and find a gguf page, you can click on Deploy and select ollama. It comes with “run” but whatever—just change to pull. Has a jacked name, but works.
Also, if you search on ollama’s models, you’ll see user ones that you can download too
GLM 4.5 has a new/modified architecture. From what I understand, MLX was really one of the only frameworks that had support for it as of yesterday. LM Studio supports MLX as one backend. Everyone else was/is still developing support for it.
Ollama has the new 235B and 30B Qwen3 models from this week, so it’s not as if they have done nothing for a month.
We work closely with majority of research labs / model creates directly. Most of the times we will support models on release day. There are sometimes where the release window for major models are fairly close - and we just have to elect to support models where we believe will better support a majority of users.
Nothing out of spite, and purely limited by the amount of effort required to support these models.
We are hopeful too -- where users can technically add models to Ollama directly. Although there is definitely some learning curve.
Seems that Google intend it to be that way - https://ai.google.dev/gemma/docs/capabilities/function-calli... . I suppose they are saying that the model is good enough that if you put the tool call format in prompt it should be able to handle any formats.
I use PetrosStav/gemma3-tools and it seems that it only works half of the time - the rest the model call the tool but it doesn't get properly parsed by Ollama.
unfortunately, I don't think gemma 3 supports tool calling well. It's not trained into the model, and the 'support' for tool calling is post model training.
We are working with Google, and trying to give the feedback on improving tool calling capabilities for future Gemma models. Fingers crossed!
Are there any plans to improve observability toolset for developers? There is myriad of various AI chat apps, and there is no clear reason why another one from Ollama would be better. But Ollama is uniquely positioned to provide the best observability experience to its users because it owns the whole server stack, any other observability tool (eg Langfuse) may only treat it as a yet another API black box.
Does the new app make it easier for users to expose the Ollama daemon on the network (and mdns discovery )? It’s still trickier than needed for Home Assistant users to get started with Ollama (which tends to run on a different machine).
In the app settings now, there is a toggle for "Expose Ollama to the network" - it allows for other devices or services on the network to access Ollama.
This caught me out yesterday. I was trying to move models onto external disk, and it seems to require re-installation? but there was no sign of the simple CLI option that was previously presented and I gave up.
As a developer feature request, it would be great if ollama could support more than one location at once, so that it is possible to keep a couple models 'live' but have the option to plug in an external disk with extra models being picked up auto-magically based on the ollama_models path please. Or maybe the server could present a simple html interface next to the API endpoint?
And just to say thanks for making these models easily accessible. I am agAInst AI generally, but it is nice to be able to have a play with these models locally. I havent found one that covers Zig, but appreciate the steady stream of new models to try. Thanks.
I just symbolically link the default model directory to a fast and cheap external drive. I agree that it would be nice to support multiple model directories.
I think having a bash script as the linux installation is more of a stop-gap measure than truly supporting Linux. And ollama is FOSS compared to LM Studio and Msty (as someone who switched from ollama to LM Studio; I'm very happy to see the frontend development of ollama and an easier method of increasing the context length of a model).
> One missing feature that the ChatGPT desktop app has and I think is a good idea for these local LLM apps is a shortcut to open a new chat anytime (Alt + Space), with a reduced UI. It is great for quick questions.
this is actually positive even for devs. The more users have ollama installed then you can release some desktop ai app for them and don't have to bundle additional models in your own app. Easier to provide to such user free or cheaper subscription because you don't have additional costs. Latest Qwen30B models area really powerful.
Would be even better if there was a installation template that checks if Ollama is installed and if not download it as sub installation first checking user computer specs if enough RAM and fast enough CPU/GPU. Also API to prompt user (ask for permission) to install specific model if haven't been installed.
> Would be even better if there was a installation template that checks if Ollama is installed and if not download it as sub installation first..... Also API to prompt user (ask for permission) to install specific model if haven't been installed.
That's actually what we've done for our own App [1]. It checks if Ollama and other dependencies are installed. No model is bundled with it. We prompt user to install a model (you pick a model, click a button and we download the model; similar if you wish to remove a model). The aim is to make it quite simple for non-technical folks to use.
I’d heard of Msty and briefly tried it before. I checked the website again and it looks quite feature rich. I hadn’t known about LM Studio, and I see that it allows commercial use for free (which Matt doesn’t).
How would you compare and contrast between the two? My main use would be to use it as a tool with a chat interface rather than developing applications that talk to models.
I use Msty all the time and I love it. It just works and it's got all features I want now, including generating alternate responses, swapping models mid-chat, editing both sent messages and responses, ...
I also tried LM Studio a few months back. The interface felt overly complex and I got weird error messages which made it look like I'd have to manually fix errors in the underlying python environment. Would have been fine if it was for work, but I just wanted to play around with LLMs in my spare time so I couldn't be bothered.
I may be too picky, and on reflection I probably shouldn't be - it was just my first thought when I saw what the project actually is for the first time.
What I meant is the "Py" prefix is typically used for Python APIs/libraries, or Python bindings to libraries in other languages. Sometimes as a prefix for dev tool names like PyInstaller or PyEnv. It's just less often used for standalone apps, only to indicate the project was developed in Python.
Msty (msty.app). Currently they're working on Msty Studio which is only accessible to people with a license, but the desktop app is pretty good, it just doesn't have tool (MCP) support.
That feature is available in HugstonOne with a new tab, among other features :)
Edit: Is incredible how unethical are all the other developers with their crappie spam unrelated. Ollama is a great app and pioneer of AI, cudos and my best thanks.
Heads up, there’s a fair bit of pushback (justified or not) on r/LocalLLaMA about Ollama’s tactics:
Vendor lock-in: AFAIK it now uses a proprietary llama.cpp fork and builts its own registry on ollama.com in a kind of docker way (I heard docker ppl are actually behind ollama) and it's a bit difficult to reuse model binaries with other inference engines due to their use of hashed filenames on disk etc.
Closed-source tweaks: Many llama.cpp improvements haven’t been upstreamed or credited, raising GPL concerns. They since switched to their own inference backend.
Mixed performance: Same models often run slower or give worse outputs than plain llama.cpp. Tradeoff for convenience - I know.
Opaque model naming: Rebrands or filters community models without transparency, biggest fail was calling the smaller Deepseek-R1 distills just "Deepseek-R1" adding to a massive confusion on social media and from "AI Content Creators", that you can run "THE" DeepSeek-R1 on any potato.
Difficult to change Context Window default: Using Ollama as a backend, it is difficult to change default context window size on the fly, leading to hallucinations and endless circles on output, especially for Agents / Thinking models.
---
If you want better, (in some cases more open) alternatives:
llama.cpp: Battle-tested C++ engine with minimal deps and faster with many optimizations
ik_llama.cpp: High-perf fork, even faster than default llama.cpp
llama-swap: YAML-driven model swapping for your endpoint.
LM Studio: GUI for any GGUF model—no proprietary formats with all llama.cpp optimizations available in a GUI
Open WebUI: Front-end that plugs into llama.cpp, ollama, MPT, etc.
“Justified or not” — is certainly a useful caveat when giving the same credit to a few people who complain loudly with mostly unauthentic complaints.
> Vendor lock-in
That is, probably the most ridiculous of the statements. Ollama is open source, llama.cpp is open source, llamafiles are zip files that contain quantized versions of models openly available to be run with numerous other providers. Their llama.cpp changes are primarily for performance and compatibility. Yes, they run a registry on ollama.com for pre-packed, pre-quantized versions of models that are, again, openly available.
> Closed-source tweaks
Oh so many things wrong in a short sentence. Llama.cpp is MIT licensed, not GPL license. A proprietary fork is perfectly legitimate use. Also.. “proprietary“? The source code is literally available, including the patches, on GitHub in ollama/ollama project, in the “llama” folder with a patch file as recent as yesterday?
> Mixed Performance
Yes, almost anything suffers degraded performance when the goal is usability instead of performance. It is why people use C# instead of Assembly or punch cards. Performance isn’t the only metric, which makes this a useless point.
> Opaque model name
Sure, their official models have some ambiguities sometimes. I don’t know know that is the “problem” that people make it out to be when ollama is designed for average people to run models, and so a decision like “ollama run qwen3” not being the absolutely maximum best option possible rather than the option most people can run makes sense. Do really think it is advantageous or user friendly, when Tommy wants to try out “Deepseek-r1” on his potato laptop that a 671b parameter model too large to fit on almost anything consumer computer is the right choice and that it is instead meant as a “deception”? That seems…disingenuous. Not to mention, they are clearly listed as such on ollama.com, where in black and white it says the deep seek-r1 by default refers with the qwen model, and that the full model is available as deep seek-r1:671b
> Context Window
Probably the only fair and legitimate criticism of your entire comment.
I’m not an ollama defender or champion, couldn’t care about the company, and I barely use ollama (mostly just to run qwen3-8b for embedding). It really is just that most of these complaints you’re sharing from others seem to have TikTok-level fact checking.
I am somewhat surprised that this app doesn't seem to offer any way to connect to a remote Ollama instance. The most powerful computer I own isn't necessarily the one I'm running the GUI on.
This. This. A thousand times this. I hate Windows / MacOS but love their desktops. I love Linux / BSD but hate their desktops. So my most expensive most powerful workstation is always a headless Linux machine that I ssh into from a Windows or MacOS toy computer. Unfortunately most developers do not understand this. Every time I run a command in the terminal and it tries to open a browser tab without printing the URL, it makes me want to scream and shout and retire from tech forever to be a plumber.
You can replace the xdg-open command (or whichever command is used on your linux system) with your own. Just program it to fire over the url to a waiting socket on your windows box, and have it automatically open there. The details are pretty easy to work out, and the result will be seamless.
I usually do this with a port forward (ip or Unix socket) over SSH. This way my server just sends data to ~/.tunnel/socket, and my SSH connection handles getting it to my client.
(It’s a bit more complicated with starting a listening server in my laptop, making sure the port forwarded file doesn’t exist, etc, but this is the basic idea.)
Or just display the URL in terminal. I spent 5 years of my life ricing my Linux machine to get it as I want it to be only to realise that, at least for my needs and likes, nothing matches MacOS’s DE, compositor and font rendering.
Not a bash on Linux desktop users, just my experience.
You can work around this by using SSH port forwarding (ssh -L 11434:localhost:11434 user@remote) to connect to a remote Ollama instance, though native support would definitely be better.
But it seems like the GUI already connects over the network, no? In that case, why do you need to do user research for adding what is basically a command line option, at its simplest? It would probably take less time to add that than to write the comment.
They will have to support auth if they are adding support for connecting with remote host. It's not difficult but it's not as trivial as you suggested.
It's definitely coming, there is no way they would leave such an important feature on the table. My guess is they are waiting so they can announce connections to their own servers.
I gave the Ollama UI a try on Windows after using the CLI service for a while.
- I like the simplicity. This would be perfect for setting up a non-technical friend or family member with a local LLM with just a couple clicks
- Multimodal and Markdown support works as expected
- The model dropdown shows both your local models and other popular models available in the registry
I could see using this over Open WebUI for basic use cases where one doesn't need to dial in the prompt or advanced parameters. Maybe those will be exposed later. But for now - I feel the simplicity is a strength.
Small update: thinking models also work well. I like that it shows the thinking stream in a fainter style while it generates, then hides it to show the final output when it's ready. The thinking output is still available with a click.
Another commenter mentioned not being able to point the new UI to a remote Ollama instance - I agree, that would be super handy for running the UI on a slow machine but inferring on something more powerful.
For all of Electron's promise in being cross-platform, "I'll just press this button and ship this Electron app on Linux and everything will be fine" is not the current state of things. A lot of it is papercuts like glibc version aggravation, but GPU support is persistently problematic.
The Element app on Linux is currently broken (if you want to use encryption, so basically for everyone) due to an issue with Electron. Luckily it still works in a regular browser. I'm really baffled by how that can happen.
I believe power users or developers can already use this from CLI in Linux. This new app for Windows and MacOS shows this is intended for regular users.
I've been on something of a quest to find a really good chat interface for LLMs.
Most import feature for me is that I want to be able to chat with local models, remote models on my other machines, and cloud models (OpenAI API compatible). Anything that makes it easier to switch between models or query them simultaneously is important.
Here's what I've learned so far:
* Msty - my current favorite. Can do true simultaneous requests to multiple models. Nice aesthetic. Sadly not open source. Have had some freezing issues on Linux.
* Jan.ai - Can't make requests to multiple models simultaneously
* LM Studio - Not open source. Doesn't support remote/cloud models (maybe there's a plugin?)
* GPT4All - Was getting weird JSON errors with openrouter models. Have to explicitly switch between models, even if you're trying to use them from different chats.
Still to try: Librechat, Open WebUI, AnythingLLM, koboldcpp.
I've been in the same quest for a while. Here's my list, not a recommendation or endorsement list, just a list of alternative clients I've considered, tried or am still evaluating:
- chatbox - https://github.com/chatboxai/chatbox - free and OSS, with a paid tier, supports MCP and local/remote, has a local KB, works well so far and looks promising.
- macai - https://github.com/Renset/macai simple client for remote APIs, does not support image pasting or MCP or anything really, very limited, crashes.
- typingmind.com - web, with a downloadable (if paid) version. Not OSS, but one-time payment, indie dev. One of the first alt chat clients I've ever tried, not using it anymore. Somewhat clunky gui, but ok. Supports MCP, haven't tried it it.
- Open WebUI - deployed for our team so that we could chat through many APIs. Works well for a multi-user web-deployment, but image generation hasn't been working. I don't like it as a personal client though, buggy sometimes but gets frequent fixes fortunately.
- jan.ai - it comes with popular models pre-populated listed, which makes it harder to plug into custom or local model servers. But it supports local model deployment within the app (like what ollama is announcing) which is good for people who don't want to deal with starting a server. I haven't played with it enough, but I personally prefer to deploy a local server (ie ollama, litellm...) and then just have the chat gui app give me a flexible endpoint configuration for adding custom models to it.
I'm also wary of evil actors deploying chat GUIs just to farm your API keys. You should be too. Use disposable api keys, watch usage, refresh with new keys once in a while after trying clients.
do you have any screenshots? the home page shows a picture of a tamagotchi but none of the actual chat interface, which makes me wonder if I’m outside of the target audience
Last I tried OpenWebUI (A few months ago), it was pretty painful to connect non-OpenAI externally hosted models. There was a workaround that involved installing a 3rd party "function" (or was it a "pipeline"?), but it didn't feel smooth.
Is this easier now? Specifically, I would like to easily connect anthropic models just by plugging in my API key.
I like webUI but it’s weird and conplicated how you have to set up the different models (via text files in the browser, the instructions contains a lot of confusing terms). Librechat is nice but I can’t get it to not log me out every 5 min which makes it unusable. I’ve been told it keeps you logged in when using https but I use tailscale so that is difficult (when doing multiple services on a single host).
CherryStudio is a power tool for this case https://github.com/CherryHQ/cherry-studio -- has MCP, search, personas, and reasoning support too. i use it heavily with llama.cpp + llama-swap
I've been using AnythingLLM for a couple months now and really like it. You can organize different "Workspaces" which are models + specific prompts and it supports Ollama along with the major LLM providers.
I have it running in a docker container on a raspberry pi and then I use Tailscale to make it accessible anywhere. It looks good on mobile too so it's pretty seamless.
I use that and Raycast's Claude extension for random questions and that's pretty much does everything I want.
Build your own! It's a great way to learn, keeps you interested in the latest developments. Plus you get to try out cool UX experiments and see what works. I built my own interface back in 2023 and have been slowly adding to it since. I added local models via MLX last month. I'm surprised more devs aren't rolling their own interface, they are easy to make and you learn a lot.
gptel in emacs does this. You can run the same prompt against different models in separate emacs windows (local or via api w/ keys) at the same time to compare outputs. I highly recommended it. https://github.com/karthink/gptel
Our team has been using openwebui as the interface for our stack of open source models we run internally at work and it’s been fantastic! It has a great feature set, good support for MCPs, and is easy to stand up and maintain.
Open WebUI is definitely what you want. Supports any OpenAI-compatible provider, lets you manually configure your model list and settings for each model in a very user-friendly way, switching between models is instant, and it lets you send the same prompt to multiple models simultaneously in the same chat and displays them side by side.
Not surprising; Ollama is set on becoming the standard interface for companies to deploy "open" models. The focus on "local" is incidental, and likely not long term. I'm sure Ollama is going to announce a plan to use "open" models through their own cloud-based API using this app.
Strongly disagree with this. It is the default go-to for companies that cannot use cloud-based services for IP or regulatory reasons (think of defense contractors). Isn't that the main reason to use "open" models, which are still weaker than closed ones?
> Ollama is set on becoming the standard interface for companies to deploy "open" models.
That's not what I've been seeing, but obviously my perspective (as anyone's) is limited. What I'm seeing is deployments of vLLM, SGLang, llama.cpp or even HuggingFace's Transformers with their own wrapper, at least for inference with open weight models. Somehow, the only place where I come across recommendations for running Ollama was on HN and before on r/LocalLlama but not even there as of late. The people who used to run Ollama for local inference (+ OpenWebUI) now seem to mostly be running LM Studio, myself included too.
I have been happy using Ollama via the command line and via API, but I am sold on their new UI for coding. I was just using the newly updated qwen3:30b model for coding, and I like the <copy> button in the too right corner of generated code listings - a simple thing but useful.
If you’re a power user of these LLMs and have coding experience, I actually recommend just whipping together your own bespoke chat UI that you can customize however you like. Grab any OpenAI compatible endpoint for inference and a frontend component framework (many of which have added standard Chat components) - the rest is almost trivial. I threw one together in a week with Gemini’s assistance and now I use it every day. Is it production ready? Hell no but it works exactly how I want it to and whenever I find myself saying “I wish it could do XYZ…” I just add it.
Kinda odd to be so dismissive of this mindset given this websites title. Whipping up your own chatui really is not that hard and is a pretty fun exercise. Knowing how your tools work and being able to tweak them to your specific usecases kinda rules!
There is a big difference between fun exercise and actually creating something that competes with the apps you can download. Building something on par with Claude Desktop, ChatGPT Desktop, etc. would be a lot of work. And I don't think the payoff would be there for most people.
Most people aren't hackers. Thanks to LLMs and vibe coding, even they can now take a can-do attitude to life that feels empowering. There's no longer any excuse to languish in helpless misery and negativity. You can just build things.
I have other things to do with my day than vibe-coding yet another stupid chat app with fewer features than one I can just download and get running in minutes. It’s not helplessness or misery, it’s just the finite number of hours I have in a day and the fact that other things are more interesting than that. I don’t grow my own wheat or maintain my own OS, either.
Yeah, ok, don't do it then. That doesn't mean because you do not want to bother, the suggestion is invalid for everyone here. There are a lot of people who just love to do their own thing, tinker with whatever they have on hand and then use the stuff they have created themselves.
its ok to let other people have fun programming and code dumb tools. you can decide yourself what you want to or not to work on, doesn't mean you should be so negative towards the idea of people who do want to code these things
I've only been lucky enough to find one opportunity in my entire twenty-seven year career to write something novel and new. Most of the time we're reinventing the wheel. What separates the winners from the losers is whether or not it's your wheel.
I only did it once some 15 years back (in a happy memory) using LFS. It took about a week to get to a functional system with basic necessities. A code finetuned model can write a functional chat UI with all common features and a decent UX in under a minute.
I have been exploring AI and LLMs. I built my own AI chat bot using Python [1], and then [2] AI SDK from Vercel and OpenAI compatible API endpoints. And eventually build a product around it.
this is not coder
this help typing instructions. Coding is different. For example look at my repository and tell me how refactorizing it, write a new function etc.
In my opinion You must change name.
Yeah, I have one which lets me read a pdf and chat side by side, one which is integrated into my rss feed, one with insanely aggressive memory features (experimental) etc etc :)
> I don't know if parenting hits the "developer-tinkerer class" harder than others, but damn.
I sort of suspect so? Devs of parenting age trend towards being neurospicy, and dev work requires sustained attention with huge penalties for interruptions.
Likewise. I use Ollama as the API server and CLI interface for local models, and use OpenWebUI when I want a web interface (which TBH, isn't that often) and it's a fine combination. Honestly, the idea of Ollama adding their own chat interface UI never even occurred to me. It feels a little bit... unnecessary?
Still choices are good, to props to the Ollama team!
I don't really care about that as a user. Maybe for FOSS purists it's important but copyright is a thing I as techy care nothing about. I can it for free and i can see all the source code. I'm not going to build a fork so the rest doesn't matter.
It's a phony BSD license, with an attempt to pass it off as the real thing with some verbiage. It's neither within the letter nor the spirit of the real BSD license.
I don't understand this move. A frontend desktop application is the opposite of what I and anyone else I know uses Ollama for. It's a local LLM backend. It's been around long enough now that any long term users have found, created and/or adjusted to their own front end interface.
I'm comfy, but some of the cutting edge local LLMs have been a little bit slow to be available recently, maybe this frontend focus is why.
I will now go and look at other options like Ollama that have either been fully UI integrated since the start, or that is committed to just being a headless backend. If any of them seem better, I'll consider switching, I probably should have done this sooner.
I hope this isn't the first step in Ollama dropping the local CLI focus, offering a subscription and becoming a generic LLM interface like so many of these tools seem to converge on.
Rightful worry, and we had the same doubts before we embarked on this. Ollama serves developers, there is no doubt about that. The CLI isn’t getting dropped, in fact, what we’ve learned in building it is having the interface interacting with Ollama is a great way for us to dogfood Ollama while building it.
There are so many choices for having an interface, and as a developer you should have a choice in selecting the UI you want. It will all continue to work with Ollama. Nothing about that changes.
Thanks for the response, appreciated. It confirms my feelings though: there are already so many choices for an interface, why are you - a team of people who built a backend LLM - now spending your time doing front end stuff under the same backend product name?
This is sending a very loud message that your focus is drifting away from why I use your product. If it was drifting away into something new and original that supplements my usage of your product, I could see the value, but like you said: there's already so many choices of good interface. Now you're going to have to play catchup against people whose first choice and genuine passion is LLM frontend UIs.
Sorry! I will still use ollama, and thank you so much for all the time and effort put in. I probably wouldn't have had a fraction of the local LLM fun I've had if it wasn't for ollama, even if my main usage is through openwebui. Ultimately, my personal preference is software that does 1 thing and does it well. Others prefer the opposite: tightly integrated all-bells-and-whistles, and I'm sure those people will appreciate this more than me - do what works for you, it's worked so far:)
I know, I often do that, but it's still not enough. E.g. things like SmolLM3 which required some llama ccp tweaks wouldn't work via guff for the first week after it had been released.
I just can't see a user-focused benefit for a backend service provider to start building and bundling their own frontend when there's already a bunch of widely used frontends available.
I’ve been building a Swift app [1], compatible with OpenAI APIs, easy model switching across providers, and with hotkeys for OS integration to capture text and images. It’s far more minimal than most other LLM frontends I’ve tried, but it’s been sticky for me.
Makes total sense. You cannot be constrained by the CLI when do much of what models do is multimodal and graphical. I don't think this dilutes their efforts in running the models or the CLI. In fact it's a huge enhancement and helps them penetrate the enterprise market in the long term. And the reality is, when you take VC funding for an open source tool, your customer basis is going to be the enterprise and your inevitable goal is to become a profitable business. Do not let any of the delusions of Docker fool you. Build a thing, take VC money, you need to return that investment with profit. Unfortunately free things and Dev centric tooling often make it very difficult to establish that business model. So for Ollama to take this UI approach potentially let's them then monetize a lot of things around the GUI and leave the CLI tool free.
does anyone have a suggestion on running LLMs locally on a windows PC and then accessing them (thru an app / gui) on mac? My windows PC is a gaming PC with a pretty good GPU and I'd like to take advantage of that.
I have been experimenting many LLMs in Ollama, but the opensource models are still behind paid versions like Cohere. Any model which gives onpar performance and quality of result compared to Cohere , please let me know
Aren't cohere's models pretty dated now? They don't even show up on leaderboards (synthetic or real) these days. What about GLM 4.5, Qwen 3 235b 2507 or even just Qwen 3 32b 2507 etc...
There's also Jan AI, which supports Linux, MCP, any Vulkan GPU, any Llama.cpp-compatible model, and optionally multiple cloud models as well. That seems like a better solution than this.
Choice is good but here is why prefer Ollama over others (I'm biased because I work on Ollama).
Supporting multiple backends is HARD. Originally, we thought we'd just add multiple backends to Ollama - MLX, ROCm, TRT-LLM, etc. It sounds really good on paper. In practice, you get into the lowest common denominator effect. What happens when you want to release Model A together with the model creator, and backend B doesn't support it? Do you ship partial support? If you do, then you start breaking your own product experience.
Supporting Vulkan for backwards compatibility on some hardware seems simple right? What if I told you in our testing, there is a portion of the supported hardware matrix getting -20% decrease in performance. What about just cherry picking which hardware to use Vulkan vs ROCm vs CUDA, etc? Do you start managing a long and tedious support matrix, where each time a driver is updated, the support may shift?
Supporting flash attention sounds simple too right? What if I told you over 20% of the hardware and for specific models, enabling it will cause non-trivial amount errors pertaining to specific hardware/model combinations? We are almost in a spot, where we can selectively enable flash attention per type of model architecture and hardware architecture.
It's so easy to add features, and hard to say no, but given any day, I will stand for a better overall product experience (at least to me since it's very subjective). No is temporary and yes is forever.
Ollama focuses on running the model the way the model creators intended. I know we get a lot of negativity on naming but often times, it's what we work with the model creators on naming (which surprisingly may or may not be how another platform named it on release). Overtime, I think this means more focus on top models to optimize more and add capabilities to augment the models.
Sure, those are all difficult problems. Problems that single devs are dealing with every day and figuring out. Why is it so hard for Ollama?
What seems to be true is that Ollama wants to be a solution that drives the narrative and wants to choose for its users rather than with them. It uses a proprietary model library, it built itself on llama.cpp and didn't upstream its changes, it converted the standard gguf model weights into some unusable file type that only worked with itself, etc.
Sorry but I don't buy it. These are not intractable problems to deal with. These are excuses by former docker creators looking to destroy another ecosystem by attempting to coopt it for their own gain.
Started with ollama, am at the stage of trying llama.ccp and realising there RPC just works, and ollama's promises of distributed runs is just hanging in the air, so indeed the convenience of ollama is starting to lose its appeal.
So, questions: what are the changes that they didn't upstream, is this listed somewhere? what is the impact? are they also changes in ggml? what was the point of the gguf format change?
^^^ absolutely spot on. There’s a big element of deception going on. I could respect it (and would trust the product more) if they were upfront about their motives and said “yes we are a venture backed startup and we have profit aspirations, but here’s XYZ thing we can promise. Instead it’s all smoke and mirrors … super sus.
> Supporting multiple backends is HARD. Originally, we thought we'd just add multiple backends to Ollama - MLX, ROCm, TRT-LLM, etc. It sounds really good on paper. In practice, you get into the lowest common denominator effect. What happens when you want to release Model A together with the model creator, and backend B doesn't support it? Do you ship partial support? If you do, then you start breaking your own product experience.
You conceptually divide your product to "universal experience" and "conditional experience". You add platform-specific things to the conditional experience, while keeping universal experience unified. I mean, do you even have a choice? The backend limits you, the only alternative you have is to change the backend upstream, which often times is the same as no alternative.
The only case where this is a real problem is when the backends are so different that the universal experience is not the main experience. But I don't think this is the case here?
I tried Ollama once but immediately removed it, when I couldn't easily install models that are outside of the models they "support". LM Studio is by far the best tool out there in my humble opinion.
I need more than text.
I need recognise audio to text (not only english) longest than 30s. I need generate audio. And generate image. This is important. text is trivial
I installed this last night on one of my cheaper computers. Ran gemma3:4b on a 16GB Ram laptop, I know HN loves specifics, so it's this exact computer ( with an upgraded 2 TB SSD).
ASUS - Vivobook S 14 - 14" OLED Laptop - Copilot+ PC - Intel Core Ultra 5 - 16GB Memory - 512GB SSD - Neutral Black
It's a bit slower than o4-mini and probably not as smart, but I feel more secure in asking for a resume review. The GUI really makes pasting in text significantly easier. Yeah I know I could just use the cli app and postman previously, but I didn't want to set that up.
Off-topic I suppose but the llama artwork looks quite good, and stylistically consistent between pieces. I wonder if it was done by a human artist or if generative models are just that good now.
Until now I've been able to reliably distinguish generated artwork from human authored artwork with ~90% accuracy. Of course, it's always getting better, but my initial research tells me the main logo has existed since Jan 2024: https://github.com/ollama/ollama/issues/2152
I don't think it was generated. (on the basis that this can't be some cutting-edge new model whose output I haven't seen yet)
Well, they gotta do what they gotta do. But as a developer, this kills the positioning and trust it had for me. I do not see it as a developer tool project anymore.
Ben, we've had private conversations about this previously. I don't see any VC money grab nor am I aware of any.
Building a product that we've dreamed of building is not wrong. Making money does not need to be evil. I, and the folks who worked tirelessly to make Ollama better will continue to build our dreams.
This doesn't appear to indicate whether the model is running locally, so I assume it's not. I'll continue to run Ollama locally in my terminal on the rare occasions that I see a use for it.
if im being honest i care more about multiple local ai apps on my desktop all hooking into the same ollama instance rather than all downloading their own models as part of the app so i have like multiple 10s of gbs of repeated weights all over the place because apps dont talk to each other
I haven't used a local model in a while but ollama was the only one I've seen convert models into a different format. (I think for reduplication). You should be able to say download a gguf file and point a bunch of frontends to that same file.
Honestly to be expected with a 4b model. 12b/14b+ is the minimum in my experience to get decent results, unless you have a specific use-case for the 4b ones and fine-tune it to your use.
Im surprised it took this long. I vibe coded the same interface last year using electron... just not Ollama because there are just better architectures/pipelines...
No one should use ollama. A cursory search of r/localllama gives plenty of occassions where they've proven themselves bad actors. Here's a 'fun' overview
There are multiple (far better) options - eg LM studio if you want GUI, llama.cpp if you want the CLI that ollama ripped off. IMO the only reason ollama is even in the conversation is it was easy to get running on macOS, allowing the SV MBP set to feel included
/r/LocalLlama is a very circle-jerky subreddit. There's a very heavy "I am new to GitHub and have a lot of say"[0] energy. This is really unfortunate because there's also a lot of people doing tons of good work there and posting both cool links and their own projects. The "just give me an EXE types" will brigade causes they do not understand and white knight projects and attack others for no informed logic reason. They're not really a good barometer for the quality of any project, on the whole.
I literally just turned a fifteen year old MacPro5,1 into an Ollama terminal, using an ancient AMD VEGA56 GPU running Ubuntu 22... and it actually responds faster than I can type (which surprised me considering the age of this machine).
No former Linux experience, beyond basic Mac OS Terminal commands. Surprisingly simple setup... and I used an online LLM to hold my hand as we walked through the installation / setup. If I wanted to call the CLI, I'd have to ask an online LLM what that code even is (something something ollama3.2).
>ollama is probably the easiest tool ... to experiment with LLMs locally.
Seems quite simple so far. If I can do it (blue collar electrician with no programming experience) than so can you.
completely useless move. there are already tons of good clients for Ollama. The Ollama devs need to focus on being a better llama.cpp, not building clients.
Ollama is a VC funded company that ultimately needs a revenue model they serve investors, not open source developers. Llama.cpp is a means to an end to them, not the goal. It's hard to monetize open source libraries. But a good chat client might lead to paying enterprise users.
Running good enough models locally is appealing to a lot of people and kind of hard if you are not a developer. If you are it's easy (been there done that). That's the core premise of the company. Their tech is of course widely used and for a while they've been focusing just on getting it to that stage. But that's never going to add up to revenue. So, they need to productize what they have.
Wow, is it a coincidence that every comment that says anything negative about ollama gets downvoted/flagged into oblivion? what is going on in this thread?
Barely any comments in the thread are flagged. The comment by swyx has a positive score.
Some comments have been downweighted for being generic or off-topic, which is standard moderation; our role as moderators is to keep the discussion threads on-topic. But the comment that was left at the top of the thread after I'd done that seemed at least somewhat negative/critical towards the Ollama team.
I'm sad this post is greyed out. I think it's a fair take.
Other critical takes say the same thing, but wrapped in far more variations of: "definitely not judging/criticising/being negative, but I don't like this."
This is clearly a new direction for Ollama, but I can't find anything at the link explaining or justifying why they're doing it, and that makes me uncomfortable as an existing regular Ollama user.
I think this move does deserves firmer feedback like yours.
Shameless plug: I’ve been building a native AI chat client called BoltAI[0] for the last 3 years. It’s native, feature-rich, and supports multiple AI services, including Ollama and LM Studio.
Looks like a big pivot on target audience from developers to regular users, at least on the homepage https://ollama.com/ as a product. Before, it was all about the CLI versions of Ollama for devs, now it's not even mentioned. At the bottom of the blog post it says:
> For pure CLI versions of Ollama, standalone downloads are available on Ollama’s GitHub releases page.
Nothing against that, just an observation.
Previously I tested several local LLM apps, and the 2 best ones to me were LM Studio [1] and Msty [2]. Will check this one out for sure.
One missing feature that the ChatGPT desktop app has and I think is a good idea for these local LLM apps is a shortcut to open a new chat anytime (Alt + Space), with a reduced UI. It is great for quick questions.
[1] https://lmstudio.ai/
[2] https://msty.app/
one of the maintainers for Ollama. I don't see it as a pivot. We are all developers ourselves, and we use it.
In fact, there are many self-made prototypes before this from different individuals. We were hooked, so we built it for ourselves.
Ollama is made for developers, and our focus in continually improving Ollama's capabilities.
Congratulations on launching the front-end, but I don't see how it can be made for developers and not have a Linux version.
I've never used a single linux GUI app in my 15 years of developing software. No company I've worked for even gives out linux laptops.
Its very strange, but they do have a Linux client that they refuse to mention in their blog post. I have no idea if this is a simple slip-up or if it was for some reason intentional.
https://ollama.com/download/linux
This link is for the existing cli version, not the new gui app.
Whoah, are you telling me that there are devs on Linux who use anything else than a tiled WM? CLI or GTFO /s
I just updated and a bit annoying by default gemma3:4b was selected that I don't have on my local. I guess would be nicer to default to one of the models that are present.
It was nice it started downloading it but also there was no indication I don't have that model before hand until I opened drop-down to see download buttons.
But of course nice job guys.
Thanks for the kind words. Sorry about that, we are working out some of the initial experience for Ollama.
The need to start a chat for a model that is not currently downloaded in order to initiate a download confused me for a minute the first time I tried it out. A more intuitive approach (which is the first thing I tried to do before I figured it out) might be to make the download icons in the model list clickable, to initiate a download. Then you could display a download progress bar in the list, and when models have been downloaded show a little "info" icon that is also clickable to display the model card, surface other model specific options, and enable deletion. Love the new UI, kudos!
Big thanks to you and your team for this. My first time tying offline models, will the github cli use the same models by default (MacOS)?
Congratulations on the next release.
I really like using ollama as a backend to OpenWebUI.
I don't have any windows machines and I don't work primarily on macos, but I understand that's where all the paying developers are, in theory.
Did y'all consider a partnership with one of the existing UI and bundle that, similar to duckdb approach?
I’m just curious because I don’t use Ollama and have some spare vram: How do you use it and what models do you use?
do you know why ollama hasn't updated its models in over a month while many fantastic models have been released in that time, most recently GLM 4.5? It's forcing me to use LM Studio which I for whatever reason absolutely do not prefer.
thank you guys for all your work on it, regardless
You know that if you go to hugging face and find a gguf page, you can click on Deploy and select ollama. It comes with “run” but whatever—just change to pull. Has a jacked name, but works.
Also, if you search on ollama’s models, you’ll see user ones that you can download too
GLM 4.5 has a new/modified architecture. From what I understand, MLX was really one of the only frameworks that had support for it as of yesterday. LM Studio supports MLX as one backend. Everyone else was/is still developing support for it.
Ollama has the new 235B and 30B Qwen3 models from this week, so it’s not as if they have done nothing for a month.
We work closely with majority of research labs / model creates directly. Most of the times we will support models on release day. There are sometimes where the release window for major models are fairly close - and we just have to elect to support models where we believe will better support a majority of users.
Nothing out of spite, and purely limited by the amount of effort required to support these models.
We are hopeful too -- where users can technically add models to Ollama directly. Although there is definitely some learning curve.
Would love to add models direclty. And don't worry, we will figure it out!
just so you know, you can grab any gguf from huggingface and specify the quant like this:
qwen3 was updated less than a day ago: https://ollama.com/library/qwen3
Question since you are here, how long before tool-calling is enabled for Gemma3 models?
Seems that Google intend it to be that way - https://ai.google.dev/gemma/docs/capabilities/function-calli... . I suppose they are saying that the model is good enough that if you put the tool call format in prompt it should be able to handle any formats.
I use PetrosStav/gemma3-tools and it seems that it only works half of the time - the rest the model call the tool but it doesn't get properly parsed by Ollama.
unfortunately, I don't think gemma 3 supports tool calling well. It's not trained into the model, and the 'support' for tool calling is post model training.
We are working with Google, and trying to give the feedback on improving tool calling capabilities for future Gemma models. Fingers crossed!
We can do function calling with Google's Gemma 3, just need to follow the right pattern.
All instruct templates are added in "post model training", and Gemma 3 works fine calling custom MCP tools using Jan and KoboldCpp.
You can do “tool calling” via Gemma3. The issue is that it all needs to be stuck in the user prompt as there’s no system prompt
Are there any plans to improve observability toolset for developers? There is myriad of various AI chat apps, and there is no clear reason why another one from Ollama would be better. But Ollama is uniquely positioned to provide the best observability experience to its users because it owns the whole server stack, any other observability tool (eg Langfuse) may only treat it as a yet another API black box.
Does the new app make it easier for users to expose the Ollama daemon on the network (and mdns discovery )? It’s still trickier than needed for Home Assistant users to get started with Ollama (which tends to run on a different machine).
In the app settings now, there is a toggle for "Expose Ollama to the network" - it allows for other devices or services on the network to access Ollama.
There's a simple toggle for juet that.
This caught me out yesterday. I was trying to move models onto external disk, and it seems to require re-installation? but there was no sign of the simple CLI option that was previously presented and I gave up.
As a developer feature request, it would be great if ollama could support more than one location at once, so that it is possible to keep a couple models 'live' but have the option to plug in an external disk with extra models being picked up auto-magically based on the ollama_models path please. Or maybe the server could present a simple html interface next to the API endpoint?
And just to say thanks for making these models easily accessible. I am agAInst AI generally, but it is nice to be able to have a play with these models locally. I havent found one that covers Zig, but appreciate the steady stream of new models to try. Thanks.
I just symbolically link the default model directory to a fast and cheap external drive. I agree that it would be nice to support multiple model directories.
How about include MCP option?
I think welcoming another stream of users, in addition to develoeprs is a good idea.
Lots of people trying to being, and many with Ollama, and helping to create beginners is never a bad thing with tech.
I think welcoming another stream of users, in addition to develoeprs is a good idea.
Lots of people trying to being, and many with Ollama, and helping to create beginners is never a bad thing with tech.
Many things can be for both developers and end-users. Developers can use the API directly, end users, have more choices.
I think having a bash script as the linux installation is more of a stop-gap measure than truly supporting Linux. And ollama is FOSS compared to LM Studio and Msty (as someone who switched from ollama to LM Studio; I'm very happy to see the frontend development of ollama and an easier method of increasing the context length of a model).
> One missing feature that the ChatGPT desktop app has and I think is a good idea for these local LLM apps is a shortcut to open a new chat anytime (Alt + Space), with a reduced UI. It is great for quick questions.
This is exactly what I've implemented for my Qt C++ app: https://www.get-vox.com
this is actually positive even for devs. The more users have ollama installed then you can release some desktop ai app for them and don't have to bundle additional models in your own app. Easier to provide to such user free or cheaper subscription because you don't have additional costs. Latest Qwen30B models area really powerful.
Would be even better if there was a installation template that checks if Ollama is installed and if not download it as sub installation first checking user computer specs if enough RAM and fast enough CPU/GPU. Also API to prompt user (ask for permission) to install specific model if haven't been installed.
> Would be even better if there was a installation template that checks if Ollama is installed and if not download it as sub installation first..... Also API to prompt user (ask for permission) to install specific model if haven't been installed.
That's actually what we've done for our own App [1]. It checks if Ollama and other dependencies are installed. No model is bundled with it. We prompt user to install a model (you pick a model, click a button and we download the model; similar if you wish to remove a model). The aim is to make it quite simple for non-technical folks to use.
1) https://ai.nocommandline.com/
I’d heard of Msty and briefly tried it before. I checked the website again and it looks quite feature rich. I hadn’t known about LM Studio, and I see that it allows commercial use for free (which Matt doesn’t).
How would you compare and contrast between the two? My main use would be to use it as a tool with a chat interface rather than developing applications that talk to models.
I use Msty all the time and I love it. It just works and it's got all features I want now, including generating alternate responses, swapping models mid-chat, editing both sent messages and responses, ...
I also tried LM Studio a few months back. The interface felt overly complex and I got weird error messages which made it look like I'd have to manually fix errors in the underlying python environment. Would have been fine if it was for work, but I just wanted to play around with LLMs in my spare time so I couldn't be bothered.
LM studio just changed their terms to allow commercial usage for free and without any restrictions or additional actions required.
I've used Msty but it seems like LM studio is moving faster, which is kind of important in this space. For example Msty still doesn't support MCP
Have you seen https://pygpt.net/ ? Overloaded interface and little unfortunate name aside, this seems to be the best one I tried.
> little unfortunate name aside
What's wrong with the name? Are you referring to the GPT trademark? That was rejected.
I may be too picky, and on reflection I probably shouldn't be - it was just my first thought when I saw what the project actually is for the first time.
What I meant is the "Py" prefix is typically used for Python APIs/libraries, or Python bindings to libraries in other languages. Sometimes as a prefix for dev tool names like PyInstaller or PyEnv. It's just less often used for standalone apps, only to indicate the project was developed in Python.
This makes me really wonder about the relationship between Open WebUI & Ollama
Is there a way to get LM Studio to talk to remote OpenAI API servers and cloud providers?
If you have a Mac, I recommend https://boltai.com/ for that
LM Studio is for hosting/serving local LLMs. Its chat UI is secondary and is pretty limited.
Good to know, thanks. What do people generally use to connect to it for chat?
OpenWebUI seems to be the standard. Easy to spin it up in a docker container pointed to 127.0.0.1:1234/v1 and away you go.
Msty (msty.app). Currently they're working on Msty Studio which is only accessible to people with a license, but the desktop app is pretty good, it just doesn't have tool (MCP) support.
Doesn't Msty alreayd run models? What is LM Studio needed for in this setup?
That feature is available in HugstonOne with a new tab, among other features :) Edit: Is incredible how unethical are all the other developers with their crappie spam unrelated. Ollama is a great app and pioneer of AI, cudos and my best thanks.
Heads up, there’s a fair bit of pushback (justified or not) on r/LocalLLaMA about Ollama’s tactics:
---If you want better, (in some cases more open) alternatives:
“Justified or not” — is certainly a useful caveat when giving the same credit to a few people who complain loudly with mostly unauthentic complaints.
> Vendor lock-in
That is, probably the most ridiculous of the statements. Ollama is open source, llama.cpp is open source, llamafiles are zip files that contain quantized versions of models openly available to be run with numerous other providers. Their llama.cpp changes are primarily for performance and compatibility. Yes, they run a registry on ollama.com for pre-packed, pre-quantized versions of models that are, again, openly available.
> Closed-source tweaks
Oh so many things wrong in a short sentence. Llama.cpp is MIT licensed, not GPL license. A proprietary fork is perfectly legitimate use. Also.. “proprietary“? The source code is literally available, including the patches, on GitHub in ollama/ollama project, in the “llama” folder with a patch file as recent as yesterday?
> Mixed Performance
Yes, almost anything suffers degraded performance when the goal is usability instead of performance. It is why people use C# instead of Assembly or punch cards. Performance isn’t the only metric, which makes this a useless point.
> Opaque model name
Sure, their official models have some ambiguities sometimes. I don’t know know that is the “problem” that people make it out to be when ollama is designed for average people to run models, and so a decision like “ollama run qwen3” not being the absolutely maximum best option possible rather than the option most people can run makes sense. Do really think it is advantageous or user friendly, when Tommy wants to try out “Deepseek-r1” on his potato laptop that a 671b parameter model too large to fit on almost anything consumer computer is the right choice and that it is instead meant as a “deception”? That seems…disingenuous. Not to mention, they are clearly listed as such on ollama.com, where in black and white it says the deep seek-r1 by default refers with the qwen model, and that the full model is available as deep seek-r1:671b
> Context Window
Probably the only fair and legitimate criticism of your entire comment.
I’m not an ollama defender or champion, couldn’t care about the company, and I barely use ollama (mostly just to run qwen3-8b for embedding). It really is just that most of these complaints you’re sharing from others seem to have TikTok-level fact checking.
And llamacpp has a gui out of the box that’s decent.
I am somewhat surprised that this app doesn't seem to offer any way to connect to a remote Ollama instance. The most powerful computer I own isn't necessarily the one I'm running the GUI on.
This. This. A thousand times this. I hate Windows / MacOS but love their desktops. I love Linux / BSD but hate their desktops. So my most expensive most powerful workstation is always a headless Linux machine that I ssh into from a Windows or MacOS toy computer. Unfortunately most developers do not understand this. Every time I run a command in the terminal and it tries to open a browser tab without printing the URL, it makes me want to scream and shout and retire from tech forever to be a plumber.
You can replace the xdg-open command (or whichever command is used on your linux system) with your own. Just program it to fire over the url to a waiting socket on your windows box, and have it automatically open there. The details are pretty easy to work out, and the result will be seamless.
I usually do this with a port forward (ip or Unix socket) over SSH. This way my server just sends data to ~/.tunnel/socket, and my SSH connection handles getting it to my client.
(It’s a bit more complicated with starting a listening server in my laptop, making sure the port forwarded file doesn’t exist, etc, but this is the basic idea.)
I can recommend to spend a day finding and configuring a window manager that suits your needs.
Or just display the URL in terminal. I spent 5 years of my life ricing my Linux machine to get it as I want it to be only to realise that, at least for my needs and likes, nothing matches MacOS’s DE, compositor and font rendering.
Not a bash on Linux desktop users, just my experience.
This was my thought. Given the huge range of desktop environments and window managers available there has to be one that suits you.
Probably one that suits you pretty much out of the box.
I doubt that she spent the time to create a cross platform C compiler library but didn't bother trying out a few Linux desktops.
I doubt that there is no wm that suits their needs.
Hey Justine! Thank you for all your fantastic work
You can work around this by using SSH port forwarding (ssh -L 11434:localhost:11434 user@remote) to connect to a remote Ollama instance, though native support would definitely be better.
Wait, it already connects over the network, it just doesn't let you specify the hostname? That's really surprising to me.
This is a feature we are looking into supporting. Thank you for reaffirming the need.
But it seems like the GUI already connects over the network, no? In that case, why do you need to do user research for adding what is basically a command line option, at its simplest? It would probably take less time to add that than to write the comment.
They will have to support auth if they are adding support for connecting with remote host. It's not difficult but it's not as trivial as you suggested.
Ollama server already supports authentication, and URLs already have a place for credentials.
I use ollama via the llm plugin: https://github.com/taketwo/llm-ollama?tab=readme-ov-file#oll...
The app does have a function to expose Ollama to the network, so perhaps its coming?
It's definitely coming, there is no way they would leave such an important feature on the table. My guess is they are waiting so they can announce connections to their own servers.
I gave the Ollama UI a try on Windows after using the CLI service for a while.
- I like the simplicity. This would be perfect for setting up a non-technical friend or family member with a local LLM with just a couple clicks
- Multimodal and Markdown support works as expected
- The model dropdown shows both your local models and other popular models available in the registry
I could see using this over Open WebUI for basic use cases where one doesn't need to dial in the prompt or advanced parameters. Maybe those will be exposed later. But for now - I feel the simplicity is a strength.
Small update: thinking models also work well. I like that it shows the thinking stream in a fainter style while it generates, then hides it to show the final output when it's ready. The thinking output is still available with a click.
Another commenter mentioned not being able to point the new UI to a remote Ollama instance - I agree, that would be super handy for running the UI on a slow machine but inferring on something more powerful.
If you like simple, try out Jan as well. https://github.com/menloresearch/jan
Why not Linux? The UI looks to be some kind chrome based thingy - probably electron - should be easy to port to Linux.
Also is there a link to the source?
For all of Electron's promise in being cross-platform, "I'll just press this button and ship this Electron app on Linux and everything will be fine" is not the current state of things. A lot of it is papercuts like glibc version aggravation, but GPU support is persistently problematic.
The Element app on Linux is currently broken (if you want to use encryption, so basically for everyone) due to an issue with Electron. Luckily it still works in a regular browser. I'm really baffled by how that can happen.
> Download Ollama’s new app today on macOS and Windows.
> For pure CLI versions of Ollama, standalone downloads are available on Ollama’s GitHub releases page.
Sound like closed source. Plus, As I check, the app seem to be tauri app, as it use system webview instead of chromium.
Electron… wonder how this can be marketed as native then.
Nowhere on the page does it state “native”. The person who submitted the story introduced “native”
Also, this is not electron app. It does use system webview thought.
At least that is already an improvement, away from ChromeOS Development Platform.
If I already have to use a bundled web app, I prefer Electron. My system doesn't have a "system webview"; I don't have webkit-gtk installed/compiled.
If I have to use a Web app, I prefer a server URL.
Where are they marketing it as native?
Native webview? ¯\_(ツ)_/¯
I believe power users or developers can already use this from CLI in Linux. This new app for Windows and MacOS shows this is intended for regular users.
Not releasing anything for Linux because regular users don't use it is a great way to never have regular users on Linux.
I am guessing that the Linux version was first (or the announcement was worded strangely), as it is available on their download page:
https://ollama.com/download
thats just the cli versions.
this app got gui.
Ah, I missed that detail, thank you for clarifying.
I've been on something of a quest to find a really good chat interface for LLMs.
Most import feature for me is that I want to be able to chat with local models, remote models on my other machines, and cloud models (OpenAI API compatible). Anything that makes it easier to switch between models or query them simultaneously is important.
Here's what I've learned so far:
* Msty - my current favorite. Can do true simultaneous requests to multiple models. Nice aesthetic. Sadly not open source. Have had some freezing issues on Linux.
* Jan.ai - Can't make requests to multiple models simultaneously
* LM Studio - Not open source. Doesn't support remote/cloud models (maybe there's a plugin?)
* GPT4All - Was getting weird JSON errors with openrouter models. Have to explicitly switch between models, even if you're trying to use them from different chats.
Still to try: Librechat, Open WebUI, AnythingLLM, koboldcpp.
Would love to hear any other suggestions.
I've been in the same quest for a while. Here's my list, not a recommendation or endorsement list, just a list of alternative clients I've considered, tried or am still evaluating:
- chatbox - https://github.com/chatboxai/chatbox - free and OSS, with a paid tier, supports MCP and local/remote, has a local KB, works well so far and looks promising.
- macai - https://github.com/Renset/macai simple client for remote APIs, does not support image pasting or MCP or anything really, very limited, crashes.
- typingmind.com - web, with a downloadable (if paid) version. Not OSS, but one-time payment, indie dev. One of the first alt chat clients I've ever tried, not using it anymore. Somewhat clunky gui, but ok. Supports MCP, haven't tried it it.
- Open WebUI - deployed for our team so that we could chat through many APIs. Works well for a multi-user web-deployment, but image generation hasn't been working. I don't like it as a personal client though, buggy sometimes but gets frequent fixes fortunately.
- jan.ai - it comes with popular models pre-populated listed, which makes it harder to plug into custom or local model servers. But it supports local model deployment within the app (like what ollama is announcing) which is good for people who don't want to deal with starting a server. I haven't played with it enough, but I personally prefer to deploy a local server (ie ollama, litellm...) and then just have the chat gui app give me a flexible endpoint configuration for adding custom models to it.
I'm also wary of evil actors deploying chat GUIs just to farm your API keys. You should be too. Use disposable api keys, watch usage, refresh with new keys once in a while after trying clients.
I've been building this: https://dinoki.ai/
Works fully local, privacy first, and it's a native app (Swift for macOS, WPF for Windows)
do you have any screenshots? the home page shows a picture of a tamagotchi but none of the actual chat interface, which makes me wonder if I’m outside of the target audience
this looks so cool! :)
Thank you! Try it out and give us feedback
OpenWebUI is what you are looking for from a usability perspective. Supports many models chat.
Last I tried OpenWebUI (A few months ago), it was pretty painful to connect non-OpenAI externally hosted models. There was a workaround that involved installing a 3rd party "function" (or was it a "pipeline"?), but it didn't feel smooth.
Is this easier now? Specifically, I would like to easily connect anthropic models just by plugging in my API key.
The trick to this is to run a LiteLLM proxy that has all the connections to whatever you need to connect to and then point Open-WebUI to that.
I've been using this setup for several months now (over a year?) and it's very effective.
The proxy also benefits pretty much any other application you have that recognizes an OpenAI-compatible API. (Or even if it doesn't)
I tried LibreChat and OpenWebUI, between the two I would recommend OpenWebUI.
It feels a bit less polished but has more functions that run locally and things work better out of the box.
My favorite thing is that I can just type my own questions / requests in markdown so I can get formatting and syntax highlighting.
OpenWebUI refuses to support MCP and uses an MCP to OpenAPI proxy which often doesn't work. If you don't like or need MCP, then it is a good choice.
> refuses to support MCP
Why is that? Seems the way to go to add tooling to any LLM that is tool-capable
The dev is very opinionated
Does https://github.com/open-webui/mcpo/ not fill this gap?
That is the initially mentioned MCP to OpenAPI proxy.
EDIT: To add a bit, MCP is more than just tools, which is the only use case MCPO supports.
Another +1 for OpenWebUI. Development is also going really fast <3
I like webUI but it’s weird and conplicated how you have to set up the different models (via text files in the browser, the instructions contains a lot of confusing terms). Librechat is nice but I can’t get it to not log me out every 5 min which makes it unusable. I’ve been told it keeps you logged in when using https but I use tailscale so that is difficult (when doing multiple services on a single host).
CherryStudio is a power tool for this case https://github.com/CherryHQ/cherry-studio -- has MCP, search, personas, and reasoning support too. i use it heavily with llama.cpp + llama-swap
Have fun on their Issues page if you don't read and write Chinese. Documentation pages are written in Chinese as well.
I've been using AnythingLLM for a couple months now and really like it. You can organize different "Workspaces" which are models + specific prompts and it supports Ollama along with the major LLM providers. I have it running in a docker container on a raspberry pi and then I use Tailscale to make it accessible anywhere. It looks good on mobile too so it's pretty seamless. I use that and Raycast's Claude extension for random questions and that's pretty much does everything I want.
Build your own! It's a great way to learn, keeps you interested in the latest developments. Plus you get to try out cool UX experiments and see what works. I built my own interface back in 2023 and have been slowly adding to it since. I added local models via MLX last month. I'm surprised more devs aren't rolling their own interface, they are easy to make and you learn a lot.
gptel in emacs does this. You can run the same prompt against different models in separate emacs windows (local or via api w/ keys) at the same time to compare outputs. I highly recommended it. https://github.com/karthink/gptel
This is a snappy (perhaps too snappy at the expense of quality) open source mobile app for iPhones called Cactus Chat: https://apps.apple.com/us/app/cactus-chat/id6744444212
Ask me anything.
Electron. Python backend. Can talk to Ollama and other backends.
Need help with design and packaging.
https://github.com/adsharma/ask-me-anything
Thoughts about https://chorus.sh/
I have tried most of those and tend to go back to dify.ai. Open source, connects to remote endpoints, test up to 4 models at a time.
I can create workflows that use multiple models to achieve different goals.
Our team has been using openwebui as the interface for our stack of open source models we run internally at work and it’s been fantastic! It has a great feature set, good support for MCPs, and is easy to stand up and maintain.
https://boltai.com/
Once you use Open WebUI, your search will come to an end. Tread carefully.
Open WebUI is definitely what you want. Supports any OpenAI-compatible provider, lets you manually configure your model list and settings for each model in a very user-friendly way, switching between models is instant, and it lets you send the same prompt to multiple models simultaneously in the same chat and displays them side by side.
This is something you can vibe code in a day. I vibe codes something similar as a component for my larger project.
Not surprising; Ollama is set on becoming the standard interface for companies to deploy "open" models. The focus on "local" is incidental, and likely not long term. I'm sure Ollama is going to announce a plan to use "open" models through their own cloud-based API using this app.
> The focus on "local" is incidental
Strongly disagree with this. It is the default go-to for companies that cannot use cloud-based services for IP or regulatory reasons (think of defense contractors). Isn't that the main reason to use "open" models, which are still weaker than closed ones?
We are specifically using Ollama, because our stuff CANNOT leave the company internal net.
Any whiff of a cloud service and the lawyers will freak out.
That's why we run models via Ollama on our laptops (M-series is crazy powerful) and a few servers on the intranet for more oomph.
LM Studio changed their license to allow commercial use without "call me" pricing, so we might look into that more too.
> Ollama is set on becoming the standard interface for companies to deploy "open" models.
That's not what I've been seeing, but obviously my perspective (as anyone's) is limited. What I'm seeing is deployments of vLLM, SGLang, llama.cpp or even HuggingFace's Transformers with their own wrapper, at least for inference with open weight models. Somehow, the only place where I come across recommendations for running Ollama was on HN and before on r/LocalLlama but not even there as of late. The people who used to run Ollama for local inference (+ OpenWebUI) now seem to mostly be running LM Studio, myself included too.
I have been happy using Ollama via the command line and via API, but I am sold on their new UI for coding. I was just using the newly updated qwen3:30b model for coding, and I like the <copy> button in the too right corner of generated code listings - a simple thing but useful.
If you’re a power user of these LLMs and have coding experience, I actually recommend just whipping together your own bespoke chat UI that you can customize however you like. Grab any OpenAI compatible endpoint for inference and a frontend component framework (many of which have added standard Chat components) - the rest is almost trivial. I threw one together in a week with Gemini’s assistance and now I use it every day. Is it production ready? Hell no but it works exactly how I want it to and whenever I find myself saying “I wish it could do XYZ…” I just add it.
This is the most "just build your own Linux" comment I read this year.
Just download some tool and be productive within seconds, I'd say.
Kinda odd to be so dismissive of this mindset given this websites title. Whipping up your own chatui really is not that hard and is a pretty fun exercise. Knowing how your tools work and being able to tweak them to your specific usecases kinda rules!
There is a big difference between fun exercise and actually creating something that competes with the apps you can download. Building something on par with Claude Desktop, ChatGPT Desktop, etc. would be a lot of work. And I don't think the payoff would be there for most people.
Most people aren't hackers. Thanks to LLMs and vibe coding, even they can now take a can-do attitude to life that feels empowering. There's no longer any excuse to languish in helpless misery and negativity. You can just build things.
I have other things to do with my day than vibe-coding yet another stupid chat app with fewer features than one I can just download and get running in minutes. It’s not helplessness or misery, it’s just the finite number of hours I have in a day and the fact that other things are more interesting than that. I don’t grow my own wheat or maintain my own OS, either.
Yeah, ok, don't do it then. That doesn't mean because you do not want to bother, the suggestion is invalid for everyone here. There are a lot of people who just love to do their own thing, tinker with whatever they have on hand and then use the stuff they have created themselves.
its ok to let other people have fun programming and code dumb tools. you can decide yourself what you want to or not to work on, doesn't mean you should be so negative towards the idea of people who do want to code these things
But you can build something new instead of reinventing the wheel for the thousandth time.
I've only been lucky enough to find one opportunity in my entire twenty-seven year career to write something novel and new. Most of the time we're reinventing the wheel. What separates the winners from the losers is whether or not it's your wheel.
I don't know, you don't have to invent cosmopolitan libc to qualify as new, but my bar is higher than another LLM chat app.
Then again, maybe OP's app will have some new twist that will revolutionise everything.
I only did it once some 15 years back (in a happy memory) using LFS. It took about a week to get to a functional system with basic necessities. A code finetuned model can write a functional chat UI with all common features and a decent UX in under a minute.
I have been exploring AI and LLMs. I built my own AI chat bot using Python [1], and then [2] AI SDK from Vercel and OpenAI compatible API endpoints. And eventually build a product around it.
1. VT.ai https://github.com/vinhnx/VT.ai Python
2. VT Chat https://vtchat.io.vn: my own product
Yes I do that too. The important bit is the model. Rest is almost trivial. I had posted a Show HN here about the script I've been using which is open source now ( https://github.com/n-k/tinycoder ) ( https://news.ycombinator.com/item?id=44674856 ).
With a bit of help from ChatGPT etc., it was trivial to make, and I use it everyday now. I may add DDG and github search to it soon too.
this is not coder this help typing instructions. Coding is different. For example look at my repository and tell me how refactorizing it, write a new function etc. In my opinion You must change name.
Yeah, I have one which lets me read a pdf and chat side by side, one which is integrated into my rss feed, one with insanely aggressive memory features (experimental) etc etc :)
Or you could use: https://github.com/open-webui/open-webui
Either directly or use it as a base for your own bespoke experience.
[flagged]
> Tell me you're not in charge of young kids without telling me you're not in charge of young kids
Please avoid internet tropes on HN.
https://news.ycombinator.com/newsguidelines.html
> I don't know if parenting hits the "developer-tinkerer class" harder than others, but damn.
I sort of suspect so? Devs of parenting age trend towards being neurospicy, and dev work requires sustained attention with huge penalties for interruptions.
I have a 1yo too, and I could do it. I used the other tools to make one which I liked.
> Tell me you're not in charge of young kids
Yeah, my wife would murder me as our kids yelled at me for various things
I've been using Open WebUI and have been blown away, it's a better ChatGPT interface than ChatGPT!
https://github.com/open-webui/open-webui
Curious how this compares to that, which has a ton of features and runs great
Likewise. I use Ollama as the API server and CLI interface for local models, and use OpenWebUI when I want a web interface (which TBH, isn't that often) and it's a fine combination. Honestly, the idea of Ollama adding their own chat interface UI never even occurred to me. It feels a little bit... unnecessary?
Still choices are good, to props to the Ollama team!
Is the Open WebUI license still OSI-compatible? I saw some drama about this on reddit but I'm not sure about the current state.
https://docs.openwebui.com/license/
I don't really care about that as a user. Maybe for FOSS purists it's important but copyright is a thing I as techy care nothing about. I can it for free and i can see all the source code. I'm not going to build a fork so the rest doesn't matter.
It's a phony BSD license, with an attempt to pass it off as the real thing with some verbiage. It's neither within the letter nor the spirit of the real BSD license.
No it's not
That’s what I came to say. I made a tool for my Mac where I can highlight any text then set a hotkey to use that text in a query to an LLM.
Nice because it works on any text. Browser, IDE, email etc.
Isn’t that exactly how Firefox does it?
"Ollama’s new app is now available for macOS and Windows" linux sounds out for now
Shouldn't the LLM be able to code the linux version?
Never ask an AI hype bro why after years of coding agents and 10x productivity, software is just as shit as it always has been.
no, this is a scam
Vibe coding?
Click the download button and you will see Linux as an option.
Linux does not have the interface right now, and there are a lot of options for users. Open WebUI is an awesome project.
https://github.com/open-webui/open-webui
I don't understand this move. A frontend desktop application is the opposite of what I and anyone else I know uses Ollama for. It's a local LLM backend. It's been around long enough now that any long term users have found, created and/or adjusted to their own front end interface.
I'm comfy, but some of the cutting edge local LLMs have been a little bit slow to be available recently, maybe this frontend focus is why.
I will now go and look at other options like Ollama that have either been fully UI integrated since the start, or that is committed to just being a headless backend. If any of them seem better, I'll consider switching, I probably should have done this sooner.
I hope this isn't the first step in Ollama dropping the local CLI focus, offering a subscription and becoming a generic LLM interface like so many of these tools seem to converge on.
Rightful worry, and we had the same doubts before we embarked on this. Ollama serves developers, there is no doubt about that. The CLI isn’t getting dropped, in fact, what we’ve learned in building it is having the interface interacting with Ollama is a great way for us to dogfood Ollama while building it.
There are so many choices for having an interface, and as a developer you should have a choice in selecting the UI you want. It will all continue to work with Ollama. Nothing about that changes.
Thanks for the response, appreciated. It confirms my feelings though: there are already so many choices for an interface, why are you - a team of people who built a backend LLM - now spending your time doing front end stuff under the same backend product name?
This is sending a very loud message that your focus is drifting away from why I use your product. If it was drifting away into something new and original that supplements my usage of your product, I could see the value, but like you said: there's already so many choices of good interface. Now you're going to have to play catchup against people whose first choice and genuine passion is LLM frontend UIs.
Sorry! I will still use ollama, and thank you so much for all the time and effort put in. I probably wouldn't have had a fraction of the local LLM fun I've had if it wasn't for ollama, even if my main usage is through openwebui. Ultimately, my personal preference is software that does 1 thing and does it well. Others prefer the opposite: tightly integrated all-bells-and-whistles, and I'm sure those people will appreciate this more than me - do what works for you, it's worked so far:)
> some of the cutting edge local LLMs have been a little bit slow to be available recently
You can pull models directily from hugging face ollama pull hf.co/google/gemma-3-27b-it
I know, I often do that, but it's still not enough. E.g. things like SmolLM3 which required some llama ccp tweaks wouldn't work via guff for the first week after it had been released.
Just checked: https://github.com/ollama/ollama/issues/11340 still open issue.
There are many GUIs for Ollama.
This looks like a version of Ollama that bundles one.
I agree.
I just can't see a user-focused benefit for a backend service provider to start building and bundling their own frontend when there's already a bunch of widely used frontends available.
I’ve been building a Swift app [1], compatible with OpenAI APIs, easy model switching across providers, and with hotkeys for OS integration to capture text and images. It’s far more minimal than most other LLM frontends I’ve tried, but it’s been sticky for me.
[1]: https://www.wvlen.llc/apps/tomo
Makes total sense. You cannot be constrained by the CLI when do much of what models do is multimodal and graphical. I don't think this dilutes their efforts in running the models or the CLI. In fact it's a huge enhancement and helps them penetrate the enterprise market in the long term. And the reality is, when you take VC funding for an open source tool, your customer basis is going to be the enterprise and your inevitable goal is to become a profitable business. Do not let any of the delusions of Docker fool you. Build a thing, take VC money, you need to return that investment with profit. Unfortunately free things and Dev centric tooling often make it very difficult to establish that business model. So for Ollama to take this UI approach potentially let's them then monetize a lot of things around the GUI and leave the CLI tool free.
does anyone have a suggestion on running LLMs locally on a windows PC and then accessing them (thru an app / gui) on mac? My windows PC is a gaming PC with a pretty good GPU and I'd like to take advantage of that.
Start llamacpp server with the gui accessible to your Mac?
Interesting to see its source is missing from the GitHub repo. A pivot to closed source perhaps?
I have been experimenting many LLMs in Ollama, but the opensource models are still behind paid versions like Cohere. Any model which gives onpar performance and quality of result compared to Cohere , please let me know
Aren't cohere's models pretty dated now? They don't even show up on leaderboards (synthetic or real) these days. What about GLM 4.5, Qwen 3 235b 2507 or even just Qwen 3 32b 2507 etc...
There's also Jan AI, which supports Linux, MCP, any Vulkan GPU, any Llama.cpp-compatible model, and optionally multiple cloud models as well. That seems like a better solution than this.
Choice is good but here is why prefer Ollama over others (I'm biased because I work on Ollama).
Supporting multiple backends is HARD. Originally, we thought we'd just add multiple backends to Ollama - MLX, ROCm, TRT-LLM, etc. It sounds really good on paper. In practice, you get into the lowest common denominator effect. What happens when you want to release Model A together with the model creator, and backend B doesn't support it? Do you ship partial support? If you do, then you start breaking your own product experience.
Supporting Vulkan for backwards compatibility on some hardware seems simple right? What if I told you in our testing, there is a portion of the supported hardware matrix getting -20% decrease in performance. What about just cherry picking which hardware to use Vulkan vs ROCm vs CUDA, etc? Do you start managing a long and tedious support matrix, where each time a driver is updated, the support may shift?
Supporting flash attention sounds simple too right? What if I told you over 20% of the hardware and for specific models, enabling it will cause non-trivial amount errors pertaining to specific hardware/model combinations? We are almost in a spot, where we can selectively enable flash attention per type of model architecture and hardware architecture.
It's so easy to add features, and hard to say no, but given any day, I will stand for a better overall product experience (at least to me since it's very subjective). No is temporary and yes is forever.
Ollama focuses on running the model the way the model creators intended. I know we get a lot of negativity on naming but often times, it's what we work with the model creators on naming (which surprisingly may or may not be how another platform named it on release). Overtime, I think this means more focus on top models to optimize more and add capabilities to augment the models.
GPUStack doesn't seem to have the problem of lowest common denominator but supports many architectures.
https://github.com/gpustack/gpustack
Sure, those are all difficult problems. Problems that single devs are dealing with every day and figuring out. Why is it so hard for Ollama?
What seems to be true is that Ollama wants to be a solution that drives the narrative and wants to choose for its users rather than with them. It uses a proprietary model library, it built itself on llama.cpp and didn't upstream its changes, it converted the standard gguf model weights into some unusable file type that only worked with itself, etc.
Sorry but I don't buy it. These are not intractable problems to deal with. These are excuses by former docker creators looking to destroy another ecosystem by attempting to coopt it for their own gain.
Started with ollama, am at the stage of trying llama.ccp and realising there RPC just works, and ollama's promises of distributed runs is just hanging in the air, so indeed the convenience of ollama is starting to lose its appeal.
So, questions: what are the changes that they didn't upstream, is this listed somewhere? what is the impact? are they also changes in ggml? what was the point of the gguf format change?
^^^ absolutely spot on. There’s a big element of deception going on. I could respect it (and would trust the product more) if they were upfront about their motives and said “yes we are a venture backed startup and we have profit aspirations, but here’s XYZ thing we can promise. Instead it’s all smoke and mirrors … super sus.
> Supporting multiple backends is HARD. Originally, we thought we'd just add multiple backends to Ollama - MLX, ROCm, TRT-LLM, etc. It sounds really good on paper. In practice, you get into the lowest common denominator effect. What happens when you want to release Model A together with the model creator, and backend B doesn't support it? Do you ship partial support? If you do, then you start breaking your own product experience.
You conceptually divide your product to "universal experience" and "conditional experience". You add platform-specific things to the conditional experience, while keeping universal experience unified. I mean, do you even have a choice? The backend limits you, the only alternative you have is to change the backend upstream, which often times is the same as no alternative.
The only case where this is a real problem is when the backends are so different that the universal experience is not the main experience. But I don't think this is the case here?
I tried Ollama once but immediately removed it, when I couldn't easily install models that are outside of the models they "support". LM Studio is by far the best tool out there in my humble opinion.
OLLAMA is a great app, and so is Lmstudio, but HugstonOne offers better experience.
I need more than text. I need recognise audio to text (not only english) longest than 30s. I need generate audio. And generate image. This is important. text is trivial
I absolutely love this.
I installed this last night on one of my cheaper computers. Ran gemma3:4b on a 16GB Ram laptop, I know HN loves specifics, so it's this exact computer ( with an upgraded 2 TB SSD).
ASUS - Vivobook S 14 - 14" OLED Laptop - Copilot+ PC - Intel Core Ultra 5 - 16GB Memory - 512GB SSD - Neutral Black
It's a bit slower than o4-mini and probably not as smart, but I feel more secure in asking for a resume review. The GUI really makes pasting in text significantly easier. Yeah I know I could just use the cli app and postman previously, but I didn't want to set that up.
For ordinary people's computers, the 4-core16G configuration and the installed model arerelatively less practical.
I must be stupid because I don't understand how this is different than having several tabs open in a browser with all the AI services I'd like open?
Ollama runs all local, away from prying corps.
OK there are plenty of webuis that consume private apis, even ollama ones
Yes, well this their new “official” one?
And now Ollama has one too. I don't think I'll use it, but makes sense to me.
Repackaging other people's work while adding literally nothing useful is Ollama's entire gig.
This hype bubble is as disgusting and scummy as the previous one.
Off-topic I suppose but the llama artwork looks quite good, and stylistically consistent between pieces. I wonder if it was done by a human artist or if generative models are just that good now.
I can't comment on whether these particular pieces were generated, but models are certainly good enough now to handle these cases and more
Until now I've been able to reliably distinguish generated artwork from human authored artwork with ~90% accuracy. Of course, it's always getting better, but my initial research tells me the main logo has existed since Jan 2024: https://github.com/ollama/ollama/issues/2152
I don't think it was generated. (on the basis that this can't be some cutting-edge new model whose output I haven't seen yet)
One of the maintainers. The logo and all the illustrations are done by a human artist.
[Shameless Plug]
I built my own Ollama macOS app written in SwiftUI: https://github.com/sheshbabu/Chital
Launches fast and weighs less than 2MB in size!
Kudos for making an effort to keep it small. Some of these Electron AppImages are 1GB+ which is pretty wild.
How do you handle markdown rendering?
Thanks!
I use this package for markdown: https://github.com/gonzalezreal/swift-markdown-ui
> Some of these Electron AppImages are 1GB+
I recently released an Electron App for Ollama [1] and it's nowhere close to 1GB (between 300 - 350MB). A 1GB App would be really big
1) https://ai.nocommandline.com/
Enjoy Ollama CLI so the output can be redirected or piped; definitely a useful feature.
Does this one not fuck up the downloads like the CLI?
I hope they'll open source it so we can contribute new ideas to the app.
I don't think this is necessary on desktop, you can just use a browser.
For me what is needed is an assistant native app for Android, something like perplexitys assistant mode that replaces gemini.
It would make using your own LLM with your phone, since you can then interact with apps.
Well, they gotta do what they gotta do. But as a developer, this kills the positioning and trust it had for me. I do not see it as a developer tool project anymore.
a little too late i think.
It came very late indeed! By now, I’m already used to LM Studio as a UI for local LLMs… it even seems to have more features than ollama.
But I liked to know that ollama developed a GUI as well — more options is always better, and maybe it will improve in the future.
not too late for VC money-grab tho.
Edit: I hope I'm wrong about this. Thanks for clarifying.
Ben, we've had private conversations about this previously. I don't see any VC money grab nor am I aware of any.
Building a product that we've dreamed of building is not wrong. Making money does not need to be evil. I, and the folks who worked tirelessly to make Ollama better will continue to build our dreams.
I mean you’re a YC backed startup soo it’s not like it’s out of the question lol
This doesn't appear to indicate whether the model is running locally, so I assume it's not. I'll continue to run Ollama locally in my terminal on the rare occasions that I see a use for it.
the model is running locally - you can check by turning off the wifi.
This is good. I much prefer this.
Aak! Finally! This was long overdue!
No Linux, that's a bummer. I've been using it in Linux for inference for ages, it's pretty easy to use.
I'll stick with OpenWebUI then.
It needs web search, because the smaller models often lack information.
Also a display of whether a model fits into vram would be nice.
Native or another electron crap?
It's use system webview.
I mean, nice but something like open-webui is just gazillions times better. This is just a GUI to the command line ...
finally, what took so long lmao
if im being honest i care more about multiple local ai apps on my desktop all hooking into the same ollama instance rather than all downloading their own models as part of the app so i have like multiple 10s of gbs of repeated weights all over the place because apps dont talk to each other
what does it take for THAT to finally happen
I haven't used a local model in a while but ollama was the only one I've seen convert models into a different format. (I think for reduplication). You should be able to say download a gguf file and point a bunch of frontends to that same file.
this is something we are working on. I don't have a specific timeline since it's done when its done, but it is being worked on.
That's already possible via the ollama API. It's up to applications themselves to support it (and plenty do).
And this move by Ollama is going exactly in the wrong direction.
Its finally the push I need to move away. I predict ollama will only get worse from here on.
I, too, dream of this.
symlinks
OK, for those interested, I have a Ryzen9 2700 with a 12GB RX6750XT and downloaded Gemma3:4b because that was what the app suggested first.
The speed seems fine to me, but the hallucinations are wild, completely wrong on a few things I like to test the commercial offerings on.
For simple questions about the lua language and how to do things in Unity game engine the results look fairly OK.
Honestly to be expected with a 4b model. 12b/14b+ is the minimum in my experience to get decent results, unless you have a specific use-case for the 4b ones and fine-tune it to your use.
Yeah, I tried Deepseek 8b as well and it was hopeless, but it was interesting to watch it think. I haven't seen that before.
Im surprised it took this long. I vibe coded the same interface last year using electron... just not Ollama because there are just better architectures/pipelines...
And why should anyone use it or ollama itself?
No one should use ollama. A cursory search of r/localllama gives plenty of occassions where they've proven themselves bad actors. Here's a 'fun' overview
https://www.reddit.com/r/LocalLLaMA/comments/1kg20mu/so_why_...
There are multiple (far better) options - eg LM studio if you want GUI, llama.cpp if you want the CLI that ollama ripped off. IMO the only reason ollama is even in the conversation is it was easy to get running on macOS, allowing the SV MBP set to feel included
/r/LocalLlama is a very circle-jerky subreddit. There's a very heavy "I am new to GitHub and have a lot of say"[0] energy. This is really unfortunate because there's also a lot of people doing tons of good work there and posting both cool links and their own projects. The "just give me an EXE types" will brigade causes they do not understand and white knight projects and attack others for no informed logic reason. They're not really a good barometer for the quality of any project, on the whole.
[0] https://github.com/sherlock-project/sherlock/issues/2011
This is just wrong. Ollama has moved off of llama.cpp and is working with hardware partners to support GGML. https://ollama.com/blog/multimodal-models
is it?
https://github.com/ollama/ollama/blob/main/llm/server.go#L79
can you substantiate this more? llama.ccp.is also relying on ggml
[dead]
ollama is probably the easiest tool to use if you want to experiment with LLMs locally.
I literally just turned a fifteen year old MacPro5,1 into an Ollama terminal, using an ancient AMD VEGA56 GPU running Ubuntu 22... and it actually responds faster than I can type (which surprised me considering the age of this machine).
No former Linux experience, beyond basic Mac OS Terminal commands. Surprisingly simple setup... and I used an online LLM to hold my hand as we walked through the installation / setup. If I wanted to call the CLI, I'd have to ask an online LLM what that code even is (something something ollama3.2).
>ollama is probably the easiest tool ... to experiment with LLMs locally.
Seems quite simple so far. If I can do it (blue collar electrician with no programming experience) than so can you.
That or llamafile, depending on details.
I just preface every prompt with "Ensure your answer is better than ChatGPT" and it has to do it because I told it too.
how long until we can have tools support like mcp and another integration to github,youtube etc
completely useless move. there are already tons of good clients for Ollama. The Ollama devs need to focus on being a better llama.cpp, not building clients.
Ollama is a VC funded company that ultimately needs a revenue model they serve investors, not open source developers. Llama.cpp is a means to an end to them, not the goal. It's hard to monetize open source libraries. But a good chat client might lead to paying enterprise users.
Running good enough models locally is appealing to a lot of people and kind of hard if you are not a developer. If you are it's easy (been there done that). That's the core premise of the company. Their tech is of course widely used and for a while they've been focusing just on getting it to that stage. But that's never going to add up to revenue. So, they need to productize what they have.
This looks like a potentially viable way.
In addition to what others have said, I've had great experiences with LobeChat: https://github.com/lobehub/lobe-chat
they got all the LLMs in the world, and all they can do is spit electron slop.. nice
[dead]
Wow, is it a coincidence that every comment that says anything negative about ollama gets downvoted/flagged into oblivion? what is going on in this thread?
We don't have this kind of power, and in fact, most our posts gets deleted so we don't post. We do read comments and help if we can.
Negative comments help us grow and make Ollama better any way. We can take harsh feedback to make Ollama better.
To be clear, I didn't meant you guys manipulate the comments. It was just weird seeing even @swyx's comment get that much downvotes.
Barely any comments in the thread are flagged. The comment by swyx has a positive score.
Some comments have been downweighted for being generic or off-topic, which is standard moderation; our role as moderators is to keep the discussion threads on-topic. But the comment that was left at the top of the thread after I'd done that seemed at least somewhat negative/critical towards the Ollama team.
Which comments seem unreasonably low to you?
I don't understand how you can know this, even if it is true. (I only see downvotes on my comments.)
Comments with negative scores appear greyed out.
Boo. Dumb. Own the local backend. So much more to do there.
Trying to match the even larger local front end ecosystem is just a waste of energy.
I'm sad this post is greyed out. I think it's a fair take.
Other critical takes say the same thing, but wrapped in far more variations of: "definitely not judging/criticising/being negative, but I don't like this."
This is clearly a new direction for Ollama, but I can't find anything at the link explaining or justifying why they're doing it, and that makes me uncomfortable as an existing regular Ollama user.
I think this move does deserves firmer feedback like yours.
what are some of the best small scale models to run locally on a laptop ? on ollama ?
Not sure but looks like HugstonOne is rocking in comparison :)
Congrats on the launch, Ollama team.
Shameless plug: I’ve been building a native AI chat client called BoltAI[0] for the last 3 years. It’s native, feature-rich, and supports multiple AI services, including Ollama and LM Studio.
Give it a try.
[0]: https://boltai.com