Context-Aware Video Recommendation via Transcript Embeddings and LLM-Based Hashtag Generation

Thirunaavukkarasu Murugesan

doi:10.22399/ijcesen.5153

Authors

Thirunaavukkarasu Murugesan

DOI:

https://doi.org/10.22399/ijcesen.5153

Keywords:

Video Recommendation Systems, Transcript Embeddings, Large Language Models, Semantic Similarity Computation, Educational Content Discovery

Abstract

Video recommendation systems usually rely on user behavior patterns and collaborative filtering methods, so they are subject to popularity bias and create filter bubbles that result in content homogeneity. This article presents a complete system implementation using video transcript embeddings and large language model-generated hashtags to enable semantic recommendations. The system uses OpenAI Whisper to convert speech into text, Sentence-BERT to create detailed text representations, GPT-4 with special prompts to extract hashtags, and FAISS with IndexIVFPQ for fast similarity searches. Comprehensive prompt engineering experiments demonstrate that domain-adaptive prompts achieve superior precision and recall in computer science content, substantially outperforming baseline prompts and platform-generated tags in the F1 score. The complete system processes videos at real-time speeds on NVIDIA A100 GPUs, constructs indexes efficiently for large video collections, and delivers top-ranked recommendations with low latency. Reproducible experiments on educational videos across computer science, mathematics, physics, and biology demonstrate significant relevance improvement over description-based search in cold-start scenarios and substantial improvement in long-tail content exposure. All implementation details, prompt templates, evaluation datasets, and performance benchmarks are provided to enable replication and extension.

References

[1] Francesco Ricci et al., "Recommender Systems Handbook," Springer, 2010. Available: https://www.researchgate.net/publication/227268858_Recommender_Systems_Handbook

[2] Eli Pariser, "The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think," Penguin Press, 2012. Available: https://dl.acm.org/doi/10.5555/2361740

[3] William Chan et al., "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016. Available: https://ieeexplore.ieee.org/document/7472621

[4] Jacob Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," arXiv:1810.04805v2, 2019. Available: https://arxiv.org/pdf/1810.04805

[5] Tom B. Brown et al., "Language Models are Few-Shot Learners," arXiv:2005.14165v4, 2020. Available: https://arxiv.org/pdf/2005.14165

[6] Ashish Vaswani et al., "Attention Is All You Need," arXiv:1706.03762v7, 2023. Available: https://arxiv.org/pdf/1706.03762

[7] Rohan’s Byte, "Vector search strategies, focusing on clustering and Locality-Sensitive Hashing (LSH) in the context of document digitization and chunking," 2025. Available: https://www.rohan-paul.com/p/vector-search-strategies-focusing

[8] Sean Fenlon, "The Definitive 2025 Guide to Vector Databases for LLM-Powered Applications," Abovo, 2025. Available: https://www.abovo.co/sean%40abovo42.com/134572

[9] Yuefeng Cen et al., "EGRec: a MOOCs course recommendation model based on knowledge graphs," Springer, 2025. Available: https://link.springer.com/content/pdf/10.1007/s42452-025-07131-w.pdf

[10] Filippo Carnovalini et al., "Popularity Bias in Recommender Systems: The Search for Fairness in the Long Tail," Information, 2025. Available: https://www.mdpi.com/2078-2489/16/2/151

[11] Guy Shani & Asela Gunawardana, "Evaluating Recommendation Systems," 2010. Available: https://link.springer.com/chapter/10.1007/978-0-387-85820-3_8

[12] Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters." Available: https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf

[13] Alec Radford et al., "Robust Speech Recognition via Large-Scale Weak Supervision," Proceedings of the 40th International Conference on Machine Learning, 2023. Available: https://proceedings.mlr.press/v202/radford23a/radford23a.pdf

[14] Jeff Johnson et al., "Billion-scale similarity search with GPUs," arXiv:1702.08734v1, 2017. Available: https://arxiv.org/pdf/1702.08734

[15] Hyeungill Lee and Jungwoo Lee, "Scalable deep learning-based recommendation systems," ICT Express, Volume 5, Issue 2, 2019. Available: https://www.sciencedirect.com/science/article/pii/S2405959518302029

Context-Aware Video Recommendation via Transcript Embeddings and LLM-Based Hashtag Generation

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Keywords

Announcements

Current Issue