Testing full-text search

A bemusing problem landed in my hands not too long ago. And after spending some time thinking about it, I still can’t figure out an elegant way of dealing with it.

I have some piece of code that deals with full-text search queries. And it apparently does work. But I need to write some nice tests. And here’s where I’m stuck.

What’s the best way of testing full-text search?

Without getting too deep into the details and semantics, let’s say I have some kind of in-memory DB, and I need to write a bunch of generative tests that would test full-text query capabilities.

Let me use a simple Elasticsearch query as an example:

    "query": {
      "query_string": {
        "query" : "*",
        "fields": [],
        "default_operator": "and"

How do we thoroughly test these types of queries, with different values of query, with random fields, with default_operator set to OR, or AND?

  • Do I first need to generate a bunch of documents, and then based on that data, find a way to generate a bunch of queries?

  • Or do I first generate a bunch of different queries, and based on that data, somehow generate documents that would match (or wouldn’t) those queries.

Has anyone done anything similar before?

Of course, I can make a few fixtures and write a few dozen example-based tests. But I’m sure that’s a guaranteed way of missing something.