Blog

Why We Can't Use Elasticsearch as Primary Datastore



If you deal with a lot of data and have limited resources, Elasticsearch is not a good option to rely upon. Elastic does not possess any safeguards in case of overrunning, and it gets effortless to exhaust resources.

However, if your maximum heap size if 4 GB and in case, Elastic asks for 4.1 GB during a bulk load, every previously loaded content will become corrupted.

Elastic is good as long as you work within the limited bounds of its intended usage. In particular situations, it doesn’t fail gracefully. But surely it is not a good idea to use it as your primary data store. For more reasons, read the points mentioned below.

  • Probably, you won’t benefit at all from it, if you don’t have a multi-node cluster. Documents are stored in indices across fragments of a cluster’s nodes. Indices possessing similar characteristics represent a collection of documents. In Elasticsearch consulting, a document is a JSON object consisting of various fields. And ES cannot scan all the documents serially and won’t be able to handle search queries because it will become too slow.
  • Since it stores documents of JSON, it is non-relational and schema-less. This means no migrations, no queries. Elasticsearch comes with certain restrictions.
  • Deep pagination is another problem. Elasticsearch needs much memory to paginate distant pages since it is distributed. Elasticsearch will search every document into the memory that you have asked for and then sort them and return the document you asked for. This doesn’t seem to be a good option when you are searching through a lot of files.
  • Documentation is written in a tutorial style. It is intended for people to use it precisely the way developers intend it to be. Lots of obscure uses are supported, but only in theory. In these cases, documentation is either hidden in the middle of documentation or non-existent. You will have no complaints if you just want to index some data and retrieve it later using straightforward queries. But if you wish to take pages out of complex SQL and turn it into Elasticsearch queries, you have to face some complications.
  • Default settings are something you shouldn’t take seriously in this context. Until you run into a significant problem, they tend to work misleadingly. For example, there is a default limit for the size of the buffer used for handling POST data that’s 100 MB. This is entirely unreasonable as it works fine on tiny test loads. Ignore the comment that pops up saying nodes= 3 is reasonable for systems more significant than one node if you have a multi-node cluster. Consider the comment useless here.
  • Security is everyone’s priority. And Elasticsearch does not provide one. It doesn’t provide any access or to control functionality or authentication. It allows anyone, having a connection to a cluster to make any requests. Also, there is no support available for transactions or processing on data manipulation.
Bottom Line

Despite the troubles as mentioned above that Elasticsearch will make you go through, if you are willing to deal with it, then certainly there’s no problem. But if you ask us, we would strongly recommend you to use something else if you deal with a large amount of data.