Datastore : Handling heavy insertions
Note: This is a case study to share what I faced, this might not be new to you and may have a better solution then I find. (You can share you solution here 😊) .
At first sight, datastore looks very easy to work with. And it is if insertions have to be done at a low rate. But when insertions are done at a high rate you come to find out certain issues which are not because your program is faulty but because you don't read the ( long detailed ) documentation of datastore, that itself has certain restrictions you might never hear before.
There are two kinds of timeouts
1. Call error 11: Deadline exceeded
2. API error 5 (datastore_v3: TIMEOUT)
Call error 11: Deadline exceeded
This kind of time out happens when your datastore operation takes time more than 60 sec to finish the operation.
For example, you are inserting too much data with a putMulti function. That takes time more then 60s insert.
API error 5 (datastore_v3: TIMEOUT)
This mostly occurs due to write contention.
When you attempt to write to a single entity group too quickly. Writes to a single entity group are serialized by the App Engine datastore, and thus there's a limit on how quickly you can update one entity group. In general, this works out to somewhere between 1 and 5 updates per second.
My CaseIn my case, it was actually worst. Because I was doing which was recommended not to do by the GAE datastore documentation.
The most common example of this occurs when an app rapidly inserts a large number of entities of the same kind with sequential IDs. In this case, most inserts hit the same range of the same tablet, and the single tablet server is overwhelmed with writes. Most apps never have to worry about this: it only becomes a problem at write rates of several hundred queries per second and above ( chances are high when you use gorotines to speed up appliaction). If this does affect your app, the easiest solution is to use more evenly distributed IDs instead of the auto-allocated ones.
( So, I got both kinds of the error are the system, and as a solution, I have to slow down insertions, because have no choice to move ids to auto-allocated ones )
A transaction is a set of Google Cloud Datastore operations on one or more entities in up to 25 entity groups. Each transaction is guaranteed to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them is applied.
I have highlighted 25 here. Because it is the digit drive crazy two times in 5 days.
For the first time I was using a multiPut of 25+ entity X + 2 related entities and I got an error of exceeding entity counts. So, I fixed it by batching entity X in the batch of 25. With that everything is fine (As per my assumption)😅.
After 48hrs I got the same error again.
Then I realised that I have to take those 2 related entities also in the count and was actually trying to process 27 entities. But luckily it was the first incident of processing 23+ entity X in production env. So, It was quickly fixed before further such incidents.
Counting entities to be processed as a group in the single transaction example.