[D] Usefulness of vector databases and their real-world applications

Key_Story_2768@alien.top · 2 years ago

[D] Usefulness of vector databases and their real-world applications

AppropriateIf@alien.top · 2 years ago

Interesting slides. Your last bit about letting LLMs do keyword search makes a lot of sense and feels related to HyDE which might be considered to be “letting LLMs do semantic vector search”. https://github.com/texttron/hyde

Calls back to openai’s experiments with webGPT - no semantic vectors involved.

blackkettle@alien.top · 2 years ago

Plugins for opensearch and Postgres are useful. Dedicated vectordbs are not IMO.

waffleseggs@alien.top · 2 years ago

No, everyone is crazy and not thinking at all right now. Vector databases are a great example of cargo culting, as are many other approaches in AI and ML.

I increasingly work with the embedding vectors, but I keep them in memory or in a regular database column. By keeping them in a regular database you can tag ordinary records with locations within embedding spaces, and you gain all kinds of helpful clustering and joining capabilities through embeddings tuned to specific tasks. You just loop over the hydrated records. You get all the same benefits and more.

Far_Ambassador_6495@alien.top · 2 years ago

I agree on the over hype. I think you can get most of the features you are talking about through metadata tagging in vector dbs. So at that point it becomes a question of which is more affordable/quicker and I guess we don’t definitively know.

But also to your point, some vector dbs have top k similar caps so a db with records above these caps wouldn’t return all records like a sql where query.

In terms of semantic search you are pretty much running the same process unless you are implement some custom distance metric which is doable in most vector dbs.

So, you are totally correct on the cargo culting thing but there could be a benefit if it is faster/cheaper or tremendous downside if it is slower/more expensive. I guess we will never know.

But functionality is the same if you choose the right vec db or a relational db

** Edit **
If I am wrong, call me an idiot and let me know where i am wrong

bestgreatestsuper@alien.top · 2 years ago

Locality sensitive hashing gets you fast multidimensional retrieval on traditional databases, so vector databases aren’t important unless you need to detect similarity in feature space. I thought vector databases were important until I learned about LSH because I assumed high dimensional retrieval was slow and not exploiting concentration of measure.

powerexcess@alien.top · 2 years ago

Isnt KDB a vector DB? A staple of finance.