Begin AI Documentation

Algorithm

Any

Use Cases

All

Batch Data Processing is the next step after you’ve uploaded your schema to Begin. This step processes your data, generates digital signatures, and uploads them to Begin’s platform. These signatures are used to create your customized A.I. model, trained on encrypted and secure versions of your data.

For an example of processing a dataset, visit this Github link to see how Goodreads’ public dataset is handled:

demo-pysdk/beginai_sdk, generate signatures and learn from data.ipynb at main · Begin-AI/demo-pysdk

Contribute to Begin-AI/demo-pysdk development by creating an account on GitHub.

https://github.com/Begin-AI/demo-pysdk/blob/main/goodreads/beginai_sdk%2C%20generate%20signatures%20and%20learn%20from%20data.ipynb

Once you log in to your Begin account, obtain your license key and app ID from the settings page.

Next, navigate to the Schema page and upload your data schema (refer to Schema guide to learn how to create one)

Once your schema is uploaded, you’ll be taken to the integration code page. You’re ready to start processing.

Installing Begin’s Library

First, install the Beginai pip library by calling:


pip install beginai

This will initiate the install process.

☝

If using virtual environments, don’t forget to add beginai to the requirements.txt file as well.

Add Your Account Credentials

Next, open your Python editor and import the library; then initialize it with the app_id and license_key that you can find under your settings menu for the team in your account.


import Begin as bg

applier = bg.AlgorithmsApplier(app_id=APP_ID, license_key=LICENSE_KEY)

Code sample of import command. Make sure to replace APP_ID and LICENSE_KEY with your ID and key.

Load Your Data

Now you’re good to load users’ data from a CSV.


applier.load_user_data('users.csv', 'user_id_column_name')
applier.learn_from_data()

The two lines above will load your CSV in memory. From there, it locally applies Begin’s platform-generated instructions, anonymizing your users’ data by converting them to mathematical signatures. These signatures are then submitted to Begin’s platform.

Similarly, you can apply Begin’s learning algorithms on the remaining objects and on users’ interactions.


applier.load_object_data('objects.csv', 
	'object_name_as_defined_in_schema', 
	'object_id_column_name')
applier.learn_from_data()

and interactions between the user and the object


applier.load_interactions(
  'interactions.csv', 
  'user_id_column',
  'object_name_as_defined_in_schema', 
  'object_id_column',
  'interaction_column'
)
applier.learn_from_data()

⚠️

HEADS UP: Make sure to use the exact name of the object/interaction as defined in the schema.

Processing Large Datasets

We recommend splitting your CSV into multiple smaller CSVs if you’re processing a large amount of data. Every time you make the call to learn from data, the memory of the library refreshes. You can load as many CSVs as you like (the example with GitHub loads about 200 million interactions, split over multiple CSVs).

☝

How large is “too large”? Our recommendation is: if your laptop can’t handle it, split it. An average laptop can process about 300k records in 30 minutes on a dataset with 30 features.

Batch Data Processing - Python

Installing Begin’s Library

Add Your Account Credentials

Load Your Data

Processing Large Datasets