Ask HN: How the same LLM “instance” serve multiple clients?
I’ve been playing with running LLMs locally and only then realized I have no idea how to scale it (I don’t really know how LLMs work internally).
I’m assuming context is everything but if the same LLM process can serve multiple clients, aren’t there risks of mixing contexts? Does anyone have any ideas?
Comments URL: https://news.ycombinator.com/item?id=43808145
Points: 1
# Comments: 0