AI Olympiad 2019–2020. Online stage

Main page Schoolchildren Training tasks AI Olympiad 2019–2020. Online stage

AI Olympiad

2019–2020

Online stage

Your task is to predict, based on information about the expenses of a bank client, which age group he falls into.

Fortune telling on cards

Is it possible to find out the client's age based on information about his card expenses? We prepared a task based on real banking transactions. When improving its products, the bank uses information about users, including age. This helps to make personalized products that meet the real needs of customers. But does calendar age always correspond to a person’s lifestyle (and purchases)?

The task

Your task is to predict, based on information about the expenses of a bank client, which age group he falls into. Training data (train) for constructing features and training models, and test data (test) for testing algorithms are given. This is specially prepared and anonymized information on which models can be trained while maintaining complete security of real customer data. The solution to the problem is the predictions of algorithms on test data.

Solve the task

Data

To solve the problem, participants were provided with information about transactions of bank clients, amounting to about 27,000,000 million records.

Each entry describes one banking transaction. For each of the ≈20,000 test IDs, participants were required to use a trained model to predict which age group the client would fall into.

Two data sets were prepared:

Training transactions_train.csv, in which the date, amount, type and client id are known for each transaction;
Test transactions_test.csv, containing the same fields:

сlient_id – unique client number;
trans_date – transaction date (it is simply the day number in chronological order, starting from the given date);
small_group – a group of transactions characterizing the type of transaction (for example, grocery stores, clothing, gas stations, children's goods, etc.);
amount_rur – transaction amount (for anonymization, these amounts were transformed without losing the structure).

Show Hide

Download Данные.zip

Solution Format

For each example from the test set, it was necessary to predict the age group to which the client belongs. A CSV file with predictions was provided to the system for verification; it should contain two columns:

client_id — client identifier;
bins — age group.

The task is a multi-class classification (4 classes - from 0 to 3). The quality of the solution is calculated as the proportion of correctly guessed age tags for all test examples – accuracy.

To solve this, it is most convenient to use the Python programming language, since it has a large number of libraries for data analysis: NumPy, Pandas, SciKit-Learn and others. The Jupyter interactive environment is used as a development tool.

Participants also had access to a basic example solution from the organizers in the form of a Jupyter notebook.

Project organizer: Charitable Foundation “Investment to the Future”, OGRN 1157700017518

The Academy of Artificial Intelligence for Schoolchildren is not an educational service subject to licensing and does not imply the issuance of a state certificate

Our social networks

Do you have any questions or suggestions?

Contact

For schoolchildren, teachers and parents

contact@ai-academy.ru

For press

info@ai-academy.ru

Personal Data Processing Policy