Datastore : Handling heavy insertions


Note: This is a case study to share what I faced, this might not be new to you and may have a better solution then I find. (You can share you solution here 😊) . 

At first sight, datastore looks very easy to work with. And it is if insertions have to be done at a low rate. But when insertions are done at a high rate you come to find out certain issues which are not because your program is faulty but because you don't read the ( long detailed ) documentation of datastore, that itself has certain restrictions you might never hear before.



Timeout Problems


There are two kinds of timeouts
1. Call error 11: Deadline exceeded
2. API error 5 (datastore_v3: TIMEOUT)


Call error 11: Deadline exceeded

This kind of time out happens when your datastore operation takes time more than 60 sec to finish the operation. 
For example, you are inserting too much data with a putMulti function. That takes time more then 60s insert.


API error 5 (datastore_v3: TIMEOUT)

This mostly occurs due to write contention.

When you attempt to write to a single entity group too quickly. Writes to a single entity group are serialized by the App Engine datastore, and thus there's a limit on how quickly you can update one entity group. In general, this works out to somewhere between 1 and 5 updates per second.


My Case 

In my case, it was actually worst. Because I was doing which was recommended not to do by the GAE datastore documentation.

The most common example of this occurs when an app rapidly inserts a large number of entities of the same kind with sequential IDs. In this case, most inserts hit the same range of the same tablet, and the single tablet server is overwhelmed with writes. Most apps never have to worry about this: it only becomes a problem at write rates of several hundred queries per second and above ( chances are high when you use gorotines to speed up appliaction). If this does affect your app, the easiest solution is to use more evenly distributed IDs instead of the auto-allocated ones.

( So, I got both kinds of the error are the system, and as a solution, I have to slow down insertions, because have no choice to move ids to auto-allocated ones )



Transactions Related Issues



A transaction is a set of Google Cloud Datastore operations on one or more entities in up to 25 entity groups. Each transaction is guaranteed to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them is applied.

I have highlighted 25 here. Because it is the digit drive crazy two times in 5 days. 

For the first time I was using a multiPut of  25+ entity X + 2 related entities and I got an error of exceeding entity counts. So, I fixed it by batching entity X in the batch of 25. With that everything is fine (As per my assumption)😅.

After 48hrs I got the same error again. 

Then I realised that I have to take those 2 related entities also in the count and was actually trying to process 27 entities. But luckily it was the first incident of processing 23+ entity X in production env. So, It was quickly fixed before further such incidents.

Counting entities to be processed as a group in the single transaction example.


Problematic code

// entityX :=  array of 25 entityX elements
// entityXKeys :=  array of 25 datastore keys for entityX elements

_, err := client.RunInTransaction(ctx, func(tx *datastore.Transaction) error {
 
        _, err := tx.PutMulti(entityXKeys, entityX) // 25 entites
 tx.Put(key1, &ele1) // 1 entity
 tx.Put(key2, &ele2) // 1 entity

 // So total here we are processing 25 + 1 + 1 = 27 elements in one entity group 
        // which is not allowe by datastore 
        return err
})

Solution

// instead of inserting 25 element at once use batch of approriate count of elements ( in
// this case 22 ) so inside transction count elements remains below or most equal to 25
for {
// entityX :=  array of 22 entityX elements
// entityXKeys :=  array of 22 datastore keys for entityX elements
_, err := client.RunInTransaction(ctx, func(tx *datastore.Transaction) error {
 
        _, err := tx.PutMulti(entityXKeys22, entityX22) // 25 entites
 tx.Put(key1, &ele1) // 1 entity
 tx.Put(key2, &ele2) // 1 entity

 // So total here we are processing 25 + 1 + 1 = 27 elements in one entity group 
        // which is not allowe by datastore 
        return err
})

}

Conclusion,

When you are going to do heavy insertion in datastore, you should keep in mind the insertion limit of transaction and timeout criteria provided by GAE datastore API.

Comments

Popular posts from this blog

setup hyperledger fabric on windows 10

Sciter : GUI Application with Golang using HTML/CSS

Setup development environment for Hyperledger composer on windows