Benefit from fresh perspectives. Join as an industry partner.

At Y-DATA, our goal is to create successful partnerships between our students and companies in the industry.
Our partners

Meet the companies we’ve already collaborated with

Benefits

Why become our partner?

Get your tasks — everyday or complex — handled with a fresh perspective.
01
Get to know talented data scientists.
02
Help the local data science community grow.
03
Basics

What are industry projects?

Expertise comes with experience. We believe that to truly understand how machine learning algorithms work, Y-DATA students should practice on real-life, full-cycle industry projects as part of their studies.

As one of the key elements of our program, these projects play an important role in turning theoretical knowledge into practical skills.
*Projects are chosen by students based on their preferences. Students are then approved by companies to take on the projects.
March — June
Duration
~ 300 work hours
Scope
2-3 students
Team
Experienced mentors
provided by Y-DATA
Guidance
Requirements

What do you need to get started?

01
Data
Pre-collected and ready before the project launch
Pre-cleaned (in fair condition)
Anonymized (or in public domain) or accessible to students on company devices
02
Commitment and time
Availability for steady communication with the Y-DATA team during the project definition process (January – February)
Capacity to attend weekly meetings with the student team (March – June)
03
Project outline
Clear and well-defined project goals
Multi-tier goal structure (minimum of two tiers graded by difficulty)
Sample Project

Project in details

Finding Genes With Similar
functional Homology
Authors
Eli Birkan
Tal Brender
Y-DATA Mentor
Shani Kotler
Industry Partner
Roy Granit
Compugen is a company focused on predictive drug discovery and development, particularly in cancer treatments.
This project aimed to develop a proof of concept for using Protein Language Models to discover and classify proteins by functionality. The focus was on the 4-Helical Cytokine Clan, which includes proteins characterized by their shape and immune system activity.
Finding unknown genes could lead to new cancer treatments. The challenge lies in identifying remote homologues, as proteins with dissimilar sequences can still belong to the same family. This project explores both traditional methods and advanced machine learning techniques to address this challenge.
Goals
Basic goal: Reproduce Compugen's experiment using the Exon Hypothesis
Advanced goal: Utilize Protein Language Models to classify and discover functionally similar proteins
Methods
  • Repurposed Smith-Waterman algorithm to work with exon lengths and phases.
  • Used ESM2, a transformer-based model trained on 60M proteins, and developed a classification pipeline using ESM embeddings.
  • Created a curated challenging dataset using similaritybased clustering to ensure robust evaluation of model generalization and remote homology detection capabilities.
  • Trained MLP classifiers for multi-class and binary classification.
  • Explainability: Implemented Attention Rollout technique for model interpretation.
Results
The Exon structure similarity method achieved 100% recall and 51,8% precision on the reviewed dataset, identifying 53 potential false positives out of 14,000 genes. It also found 28 candidate genes in the unreviewed set.
The Language Model approach demonstrated promising results, with 77% accuracy on a curated challenging dataset for multi-class classification.
The binary classifier for the Cytokine clan achieved perfect F1 scores and identified additional potential clan members in unreviewed proteins. Attention maps provided insights into the model’s focus on signal peptide (SP) regions which was observed in almost all of the identified candidates.
Overall, the project successfully developed a proof of concept for using Language Models in protein classification, enabling future work on more advanced families and categorizations.
More projects
Project catalog
Full project catalog
Web Technology
Detection
Finding Genes with Similar Functional Homology
Text Classification Modeling Improvements
Dynamic Range Prediction for Vehicles
Diastole/Systole
Indetification in ECG
Wix Restaurants
Segmentation
LLM Agent-Managed Interface for Business Monitoring
Detecting Faulty Sensor
Data
Hourly Demand
Prediction
Product Clustering
and Fraud Risk Assessment
Oral Anomaly Detection
Botnets Site Clustering
Multi-domain Digital Pathology Stain Style Transfer
Classification and Prediction of Epileptic Activity in EEG
Multi-domain digital pathology styletransfer
A comparative analysis for xai methods package
Automatic product comprehension with llms
Document validation and extraction
Text clustering for care management quality performance
Grinvision foundation model
Genomic profile representation for prediction of drug response
Generative creation of newsletter campaigns
Client application classification using exposed information
Behavioral cross-session user identification
SQL injection detection in real time
Healthscope: medical classification and contextual analysis
Content recommendation engine for emails
+16 more projects
Y-DATA

About Y-DATA

We equip our graduates with a strong professional skillset and up-to-date knowledge of the latest AI developments. Through coursework, industry projects, research seminars, and professional mentorship, we ensure our graduates are well-prepared for the demands of the field.
Learn more
Powered by Nebius
Nebius, established in late 2023, is a leading AI-centric public cloud platform designed to support the entire machine learning lifecycle. With a focus on empowering ML practitioners, Nebius offers comprehensive infrastructure and aims to become the preferred platform for generative AI developers.