Radix-logo

Radix Analytics Private Limited

Topic Identification

Not Found

Challenges for EdTech – a Data Science view

  • Provide access to voluminous learning material and testing resources while remaining focused on the topic at hand
  • Personalize the learning experience
  • Help comprehend complex concepts by presenting the same content from appropriate authors and teachers
  • Enhance Testing through adaptive testing technologies
  • Enhance Learning through adaptive learning technologies
MEROS

Need for Topic Identification in EdTech

  • A lot of (legacy) content, especially testing material (past exam papers, etc.), is available without topic mapping
  • Efficient Content Management for easy accessibility is hindered
  • Adaptive Learning Systems to provide relevant, level-appropriate material to learners
  • Enhanced Engagement & Motivation through content relevance and challenge level
  • Scalability and Accessibility by quicker content updates and making education more accessible to diverse audiences
MEROS

Utilizing NLP in Topic Identification

  • NLP transforms raw text into data structures for machine learning
  • Effectively captures context, nuances, and semantics in content, enabling accurate topic identification
  • Automates the process of tagging content with relevant topics, drastically reducing manual effort and time
  • Can be used effectively across subjects – from languages, to humanities to symbolic content like maths and organic chemistry
  • The model, once developed, enables real-time topic identification, facilitating dynamic and responsive educational platforms

Case Study

Topic Prediction in MCQ Bank For EdTech

Issues & Objectives

  • Enable the EdTech platform to serve topic focused MCQs
  • Legacy exam preparation material and any exam paper do not have topically tagged MCQs
  • Large volumes of MCQs are available and getting generated every semester (about 100K MCQs per subject)
  • We built an ML and NLP based system that accurately predicts the topic/chapter for each MCQ across all subjects

Challenges

  • Subjects: Physics, Biology, Mathematics and Chemistry
  • MCQs contain minimal common English words
  • A good number of terminologies
  • Mathematical and chemical equations and diagrams in various formats, including images
  • The processing and prediction pipeline should be scalable and available on demand, aligned with the publication cycle

Decoding Complex Challenges

Physics

Not Found

Biology

Not Found

Chemistry

Not Found

Mathematics

Not Found

Solution & Results

  • Natural Language Processing (NLP):
    • Subject specific Optical Character Recognition
    • Subject specific vectorization
  • Customised pipeline
  • Algorithms: Deep Neural Network + Random Forest
  • Cloud infra scaled on demand
  • Recall for different Subjects between 88 – 92%