Skip to main content

Datastore : Handling heavy insertions


Note: This is a case study to share what I faced, this might not be new to you and may have a better solution then I find. (You can share you solution here 😊) . 

At first sight, datastore looks very easy to work with. And it is if insertions have to be done at a low rate. But when insertions are done at a high rate you come to find out certain issues which are not because your program is faulty but because you don't read the ( long detailed ) documentation of datastore, that itself has certain restrictions you might never hear before.



Timeout Problems


There are two kinds of timeouts
1. Call error 11: Deadline exceeded
2. API error 5 (datastore_v3: TIMEOUT)


Call error 11: Deadline exceeded

This kind of time out happens when your datastore operation takes time more than 60 sec to finish the operation. 
For example, you are inserting too much data with a putMulti function. That takes time more then 60s insert.


API error 5 (datastore_v3: TIMEOUT)

This mostly occurs due to write contention.

When you attempt to write to a single entity group too quickly. Writes to a single entity group are serialized by the App Engine datastore, and thus there's a limit on how quickly you can update one entity group. In general, this works out to somewhere between 1 and 5 updates per second.


My Case 

In my case, it was actually worst. Because I was doing which was recommended not to do by the GAE datastore documentation.

The most common example of this occurs when an app rapidly inserts a large number of entities of the same kind with sequential IDs. In this case, most inserts hit the same range of the same tablet, and the single tablet server is overwhelmed with writes. Most apps never have to worry about this: it only becomes a problem at write rates of several hundred queries per second and above ( chances are high when you use gorotines to speed up appliaction). If this does affect your app, the easiest solution is to use more evenly distributed IDs instead of the auto-allocated ones.

( So, I got both kinds of the error are the system, and as a solution, I have to slow down insertions, because have no choice to move ids to auto-allocated ones )



Transactions Related Issues



A transaction is a set of Google Cloud Datastore operations on one or more entities in up to 25 entity groups. Each transaction is guaranteed to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them is applied.

I have highlighted 25 here. Because it is the digit drive crazy two times in 5 days. 

For the first time I was using a multiPut of  25+ entity X + 2 related entities and I got an error of exceeding entity counts. So, I fixed it by batching entity X in the batch of 25. With that everything is fine (As per my assumption)😅.

After 48hrs I got the same error again. 

Then I realised that I have to take those 2 related entities also in the count and was actually trying to process 27 entities. But luckily it was the first incident of processing 23+ entity X in production env. So, It was quickly fixed before further such incidents.

Counting entities to be processed as a group in the single transaction example.


Problematic code

// entityX :=  array of 25 entityX elements
// entityXKeys :=  array of 25 datastore keys for entityX elements

_, err := client.RunInTransaction(ctx, func(tx *datastore.Transaction) error {
 
        _, err := tx.PutMulti(entityXKeys, entityX) // 25 entites
 tx.Put(key1, &ele1) // 1 entity
 tx.Put(key2, &ele2) // 1 entity

 // So total here we are processing 25 + 1 + 1 = 27 elements in one entity group 
        // which is not allowe by datastore 
        return err
})

Solution

// instead of inserting 25 element at once use batch of approriate count of elements ( in
// this case 22 ) so inside transction count elements remains below or most equal to 25
for {
// entityX :=  array of 22 entityX elements
// entityXKeys :=  array of 22 datastore keys for entityX elements
_, err := client.RunInTransaction(ctx, func(tx *datastore.Transaction) error {
 
        _, err := tx.PutMulti(entityXKeys22, entityX22) // 25 entites
 tx.Put(key1, &ele1) // 1 entity
 tx.Put(key2, &ele2) // 1 entity

 // So total here we are processing 25 + 1 + 1 = 27 elements in one entity group 
        // which is not allowe by datastore 
        return err
})

}

Conclusion,

When you are going to do heavy insertion in datastore, you should keep in mind the insertion limit of transaction and timeout criteria provided by GAE datastore API.

Comments

  1. Hi there, I read your blogs on a regular basis. Your humoristic style is witty, keep it up! Thank You for Providing Such a Unique and valuable information, If you are looking for the best Google Search Api, then visit SERP House.I enjoyed this post.

    ReplyDelete
  2. Deep Learning Projects assist final year students with improving your applied Deep Learning skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include Deep Learning projects for final year into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Deep Learning Projects for Final Year even arrange a more significant compensation.

    Python Training in Chennai Project Centers in Chennai

    ReplyDelete

Post a Comment

Popular posts from this blog

Google blogger Ideas panel

Google blogger Ideas  I opened by blogger today, and..   I got this.  Google blogger Ideas  A panel suggesting a topic on which I can write my next blog. It's fetching unanswered question from web according to your previous post and topics. It was something, I was really looking for, after all it takes time to finding subject on which to write next and still being in the same niche.  Awesome feature Blogger! 

Apache : setup basic auth with apache in windows

Authentication is any process by which you verify that someone is who they claim they are. Authorization is any process by which someone is allowed to be where they want to go or to have information that they want to have. I will show here how to set up basic auth on the apache with windows. Pre-requests  Windows VPS Apache server ( That's it ) ( In windows it might be difficult to setup the Apache alone. So instead use something ling xampp , wamp or laragon .) RestClient (  I personally use the postman , but you can use your preferable client)  Windows VPS provider Steps  Enable the necessary modules in the Apache Create the password file Set the auth directives in the virtual host file. Verify basic auth. Enable the  necessary   modules  in the Apache Open the httpd.conf file in the apache's conf folder. httpd.conf file Enable the necessary modules to make the basic auth working. Necessary modules  mod_auth_basic

Firebase - update a spacific fields of single element of object of array in firestore

Firebase - update a spacific fields of single element of object of array in firestore  Its actully advisable to use map instead of array when ever it is possible. But, there are cetain cases where you don't have option to do so.  For example, you are directly saving the response from some outer source without any modification and they send you an array. In this case you will have array to work with. Firestore does not support array here is why  "bad things can happen if you have multiple clients all trying to update or delete array elements at specific indexes. In the past, Cloud Firestore addressed these issues by limiting what you can do with arrays " For more details information you can refer to Kato Richardson post Best Practices: Arrays in Firebase .  Firestore document having array [ used from stackoverflow question ] Suppose you have array of object something like shown in array. Now you want to update endTime field of the object on the index [1]