Discussion:
[google-appengine] How to delete 800 mln records from Datastore?
Kuba Włodarczyk
2018-11-26 14:21:29 UTC
Permalink
I've got around 800 mln (!) records in my Datastore entities. How I can
remove them quickly?
I'm trying to delete them via deferred tasks (python script) but it is
extremly slow...
I would appreciate any help. Thanks.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/f2a7fc27-8f3a-4250-b5b9-2253e9b471fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Vitaly Bogomolov
2018-11-26 19:27:56 UTC
Permalink
Hi Kuba.

Free quota per day for entity deletes is 20K records. So for free you will
deletes data for 80K days

Or you can delete this data in one day and will be charged $1.6K ($0.02 for
every 20K deletes over quota) + additional costs for running instanses.

https://cloud.google.com/appengine/pricing

WBR, Vitaly.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/a57fc329-6aca-4db5-9c55-5650809374fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Kuba Włodarczyk
2018-11-26 20:13:20 UTC
Permalink
Thank you for your answer. That clarifies a lot. But beside costs how I can
do this, let say in one day (expensive option)? I prefer python.
Post by Vitaly Bogomolov
Hi Kuba.
Free quota per day for entity deletes is 20K records. So for free you will
deletes data for 80K days
Or you can delete this data in one day and will be charged $1.6K ($0.02
for every 20K deletes over quota) + additional costs for running instanses.
https://cloud.google.com/appengine/pricing
WBR, Vitaly.
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/a57fc329-6aca-4db5-9c55-5650809374fb%40googlegroups.com
<https://groups.google.com/d/msgid/google-appengine/a57fc329-6aca-4db5-9c55-5650809374fb%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/CAK5%3Deuh%3Dhx4SmxB9x8kS6wLVxgPmyg4_rod22BDN8JsMYf3eVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Vitaly Bogomolov
2018-11-26 22:09:51 UTC
Permalink
something like this. code not tested and constants may be different. except
800M ;)

from google.appengine.api.taskqueue import Queue, Task
from google.appengine.ext import ndb

QUEUE = Queue('default')


def backend_function():
for j in xrange(800000000 / (100 * 20)):
QUEUE.add([Task(url='/remove_100_records_handler') for i in
range(20)])


def remove_100_records_handler():
for i in range(100 / 20):
ndb.delete_multi(YourDatastoreTable.query().fetch(20,
keys_only=True))

WBR, Vitaly
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/388a8c45-19fc-4953-8a1e-d3c748ce5536%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Mohammad I (Cloud Platform Support)' via Google App Engine
2018-11-26 23:50:22 UTC
Permalink
Hello Kuba,

You can delete entities in bulk from Cloud Datastore using Cloud
Dataflow[1] which is a managed service for developing and executing data
processing workflows. Please look at this section[2] for best practices for
deletion from Cloud Datastore.

[1]https://cloud.google.com/datastore/docs/bulk-delete

[2]https://cloud.google.com/datastore/docs/best-practices#deletions
Post by Kuba Włodarczyk
Thank you for your answer. That clarifies a lot. But beside costs how I
can do this, let say in one day (expensive option)? I prefer python.
Post by Vitaly Bogomolov
Hi Kuba.
Free quota per day for entity deletes is 20K records. So for free you
will deletes data for 80K days
Or you can delete this data in one day and will be charged $1.6K ($0.02
for every 20K deletes over quota) + additional costs for running instanses.
https://cloud.google.com/appengine/pricing
WBR, Vitaly.
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/a57fc329-6aca-4db5-9c55-5650809374fb%40googlegroups.com
<https://groups.google.com/d/msgid/google-appengine/a57fc329-6aca-4db5-9c55-5650809374fb%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/abff1b2b-6087-42ea-9f31-fe2a06269e07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Attila-Mihaly Balazs
2018-11-27 05:07:50 UTC
Permalink
AFAIK the simplest way to delete them which requires *no code to be
written* is the deprecated but still working datastore admin. In your
Google Cloud Console go to Datastore > Admin click on "Open Datastore
Admin", select the entity kind you want to delete and click "Delete
Entities". This will kick off a distributed, fan-out map reduce job which
will delete the entities in a couple of hours.

Of course, as Vitaly said, this will cost you money.

Attila
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/2b0c0945-16ee-48d0-8a4b-bf256a8b6ef0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Kuba Włodarczyk
2018-11-27 11:42:31 UTC
Permalink
Thanks Attila-Mihaly,

That's not a problem if cost will around what Vitaly said. Regarding your
solution I've tried "Datastore Admin" but delete tasks behave weird.
The job on the list says "(0 steps completed, 1 active) "
Going to details gives me this - please see screenshot attached.

Mohammad, thanks for your suggestions. I've tried this as well, however I
couldn't track progress, so I stoped this task after 7h. Also I wasn't sure
how to set up this task. I couldn't find any guide. I've entered query like
"SELECT * from Transaction" - is that ok? Transaction is my entity I would
like to remove totally.


Jakub



W dniu wtorek, 27 listopada 2018 06:07:50 UTC+1 uÅŒytkownik Attila-Mihaly
Post by Attila-Mihaly Balazs
AFAIK the simplest way to delete them which requires *no code to be
written* is the deprecated but still working datastore admin. In your
Google Cloud Console go to Datastore > Admin click on "Open Datastore
Admin", select the entity kind you want to delete and click "Delete
Entities". This will kick off a distributed, fan-out map reduce job which
will delete the entities in a couple of hours.
Of course, as Vitaly said, this will cost you money.
Attila
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/5ad44135-ef9b-47e1-8c6a-354534a4d95e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
'Amit (Google Cloud Support)' via Google App Engine
2018-11-28 23:01:46 UTC
Permalink
Hello Kuba,

As I can see from the link [1] Mohammad shared, it provided the steps on
how to setup the Cloud Dataflow to create a job to delete entities in bulk.
I can see you are on the right track already. After selecting the
‘Datastore to Datastore Delete’ , you need to put that query if
‘Transaction’ is your entity name. You can monitor the progress using Cloud
Dataflow Monitoring Interface. For more details regarding this , please
check this link [2].

[1]
https://cloud.google.com/datastore/docs/bulk-delete#deleting_entities_in_bulk

[2] https://cloud.google.com/dataflow/docs/guides/using-monitoring-intf
Post by Kuba Włodarczyk
Thanks Attila-Mihaly,
That's not a problem if cost will around what Vitaly said. Regarding your
solution I've tried "Datastore Admin" but delete tasks behave weird.
The job on the list says "(0 steps completed, 1 active) "
Going to details gives me this - please see screenshot attached.
Mohammad, thanks for your suggestions. I've tried this as well, however I
couldn't track progress, so I stoped this task after 7h. Also I wasn't sure
how to set up this task. I couldn't find any guide. I've entered query like
"SELECT * from Transaction" - is that ok? Transaction is my entity I would
like to remove totally.
Jakub
W dniu wtorek, 27 listopada 2018 06:07:50 UTC+1 uÅŒytkownik Attila-Mihaly
Post by Attila-Mihaly Balazs
AFAIK the simplest way to delete them which requires *no code to be
written* is the deprecated but still working datastore admin. In your
Google Cloud Console go to Datastore > Admin click on "Open Datastore
Admin", select the entity kind you want to delete and click "Delete
Entities". This will kick off a distributed, fan-out map reduce job which
will delete the entities in a couple of hours.
Of course, as Vitaly said, this will cost you money.
Attila
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+***@googlegroups.com.
To post to this group, send email to google-***@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/0f7468d3-4c51-4d3d-a2a7-618eef63a9c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...