Back to Projects

PDF Full Text

Resume GPT

Association:

N/A

Duration:

3 weeks

Machine Learning

OpenAI API

Python

This project is an AI-powered Resume Review Tool designed to help job seekers optimize their resumes for Applicant Tracking Systems (ATS) and improve their chances of landing interviews. Built with advanced Natural Language Processing (NLP) techniques, the tool analyzes resumes for key factors such as ATS compatibility, keyword matching, job title alignment, quantifiable achievements, action verb usage, and readability. By comparing the resume against a provided job description, the tool generates actionable feedback, including an ATS compatibility score, prioritized improvements, and suggestions for enhancing content and structure.

The idea was born out of a desire to address the overwhelming challenges faced by job seekers in today’s competitive market. As someone who has witnessed countless talented candidates struggle to navigate the complexities of Applicant Tracking Systems (ATS) and stand out in a sea of applications (including myself), I wanted to create a tool that simplifies the process. By leveraging AI and NLP, this tool aims to empower candidates with actionable insights, helping them craft resumes that not only pass ATS filters but also highlight their unique strengths and achievements. My goal is to level the playing field, ensuring that every candidate has the tools they need to succeed in their job search.

def keyword_matching(self, resume_text, job_description):
        """
        Extract keywords from the job description and compare them to the resume using TF-IDF.
        """
        
        # Initialize stopwords
        stop_words = set(stopwords.words('english'))
        
        # Add more words to ignore (common job posting filler words)
        additional_stopwords = {
            'will', 'able', 'role', 'responsibility', 'responsibilities', 'required', 'years',
            'job', 'position', 'candidate', 'candidates', 'include', 'including', 'experience',
            'work', 'working', 'ability', 'accommodation', 'must', 'should', 'can', 'strong',
            'demonstrated', 'demonstrable', 'demonstrated', 'excellent', 'outstanding',
            'successful', 'success', 'prefer', 'preferred', 'minimum', 'maximum', 'ideal',
            'ideally', 'plus', 'bonus', 'qualification', 'qualifications', 'skill', 'skills',
            'across', 'within', 'using', 'use', 'understanding', 'understand', 'knowledge',
            'related', 'degree', 'degrees', 'education', 'educational', 'field', 'fields',
            'need','service','services','look','looking','help','join','team','teams', 'new',
            'industry', 'industries', 'industry', 'sector', 'sectors', 'company', 'companies',
        }
        stop_words.update(additional_stopwords)
        
        # Technical terms that shouldn't be considered stopwords
        technical_terms = {
            'ai', 'ml', 'ui', 'ux', 'api', 'sql', 'aws', 'gcp', 'azure', 'css', 'js', 
            'ci', 'cd', 'bi', 'pm', 'vp', 'cto', 'ceo', 'cfo', 'coo', 'cpo'
        }
        for term in technical_terms:
            if term in stop_words:
                stop_words.remove(term)
        
        def clean_text(text):
            """Basic text cleaning: lowercase and remove punctuation/extra whitespace"""
            text = text.lower()
            text = re.sub(r'[^\w\s\-/]', ' ', text)  # Keep hyphens and slashes for technical terms
            text = re.sub(r'\s+', ' ', text).strip()
            return text
        
        def extract_meaningful_phrases(text):
            """Extract meaningful phrases from text focusing on skills and qualifications"""
            text = clean_text(text)
            
            # Define patterns for important phrases
            patterns = [
                # Technical skills
                r'\b(?:python|java|javascript|typescript|react|angular|vue|node\.?js|express|django|flask|sql|nosql|mongodb|postgresql|mysql|oracle|aws|azure|gcp|docker|kubernetes|terraform|jenkins|git|github|gitlab|agile|scrum|kanban|jira|confluence|tableau|power\s*bi|excel|vba|ml|ai|machine\s*learning|deep\s*learning|nlp|natural\s*language\s*processing|data\s*science|data\s*visualization|data\s*analysis|data\s*engineering|data\s*modeling|product\s*management|product\s*development|product\s*strategy|ux|ui|design\s*thinking|figma|sketch|devops|ci\/cd|continuous\s*integration|continuous\s*deployment)\b',
                
                # Business skills and domains
                r'\b(?:product\s*manager|product\s*owner|project\s*manager|program\s*manager|business\s*analyst|systems\s*analyst|solutions\s*architect|technical\s*architect|enterprise\s*architect|financial\s*analysis|financial\s*modeling|financial\s*planning|budget\s*management|strategic\s*planning|roadmap\s*development|stakeholder\s*management|team\s*leadership|team\s*management|cross\-functional|client\s*relationship|vendor\s*management|executive\s*presentation|data\-driven|decision\s*making|problem\s*solving|critical\s*thinking|communication\s*skills|fintech|insurtech|regtech|health\s*tech|e\-commerce|saas|artificial\s*intelligence|blockchain|cryptocurrency|payments|lending|wealth\s*management|retirement|asset\s*management|investment\s*management|risk\s*management|compliance|regulatory|security|user\s*experience|customer\s*experience|market\s*analysis|competitor\s*analysis|growth\s*strategy|marketing\s*strategy|digital\s*transformation|change\s*management|innovation|analytics|metrics|kpis|okrs)\b',
                
                # Education and certifications
                r'\b(?:mba|phd|master|bachelor|bs|ba|ms|cfa|cpa|pmp|csm|cspo|safe|prince2|itil|six\s*sigma|lean|comptia|cisco|microsoft\s*certified|aws\s*certified|google\s*certified|azure\s*certified)\b'
            ]
            
            # Extract all matches
            all_matches = []
            for pattern in patterns:
                matches = re.findall(pattern, text)
                all_matches.extend(matches)
            
            # Get words and bigrams (for anything not captured by patterns)
            words = text.split()
            
            # Filter out stopwords for individual words
            filtered_words = [word for word in words if word not in stop_words and len(word) > 2]
            
            # Extract bigrams (two-word phrases)
            bigrams = []
            for i in range(len(words) - 1):
                # Only keep bigrams where at least one word is not a stopword
                if words[i] not in stop_words or words[i+1] not in stop_words:
                    # Don't include bigrams that are just numbers or very short words
                    if (len(words[i]) > 2 or words[i].isdigit()) and (len(words[i+1]) > 2 or words[i+1].isdigit()):
                        bigram = words[i] + ' ' + words[i+1]
                        bigrams.append(bigram)
            
            # Combine all extracted terms
            all_terms = all_matches + filtered_words + bigrams
            return all_terms
        
        # Extract terms from job description
        job_terms = extract_meaningful_phrases(job_description)
        
        # Count frequencies
        term_counts = Counter(job_terms)
        
        # Apply domain-specific weights
        weighted_terms = {}
        for term, count in term_counts.items():
            weight = count
            
            # Give higher weights to specific domain terms
            if re.search(r'finance|financial|banking|investment|insurance', term):
                weight *= 1.5
            
            # Give higher weights to management/leadership terms if relevant
            if re.search(r'manager|director|lead|leadership|strategy', term):
                weight *= 1.3
            
            # Give higher weights to technical skills
            if re.search(r'python|java|sql|data|algorithm|machine learning|ai', term):
                weight *= 1.4
                
            # Give higher weights to product-related terms
            if re.search(r'product|roadmap|feature|requirement|backlog|agile', term):
                weight *= 1.5
                
            weighted_terms[term] = weight
        
        # Sort terms by weight
        sorted_terms = sorted(weighted_terms.items(), key=lambda x: x[1], reverse=True)
        
        # Extract top N most important terms, removing duplicates
        top_n = 15
        seen = set()
        important_keywords = []
        
        for term, _ in sorted_terms:
            # Skip similar terms (e.g., don't include both "python" and "python programming")
            should_skip = False
            for existing in important_keywords:
                if term in existing or existing in term:
                    should_skip = True
                    break
                    
            if not should_skip and term not in seen:
                important_keywords.append(term)
                seen.add(term)
                
            if len(important_keywords) >= top_n:
                break
        
        # Clean resume for comparison
        clean_resume = clean_text(resume_text)
        
        # Check for keywords in resume
        matched = []
        missing = []
        
        for keyword in important_keywords:
            if keyword.lower() in clean_resume:
                matched.append(keyword)
            else:
                # Check for variations (especially for multi-word terms)
                keyword_parts = keyword.split()
                if len(keyword_parts) > 1:
                    # For multi-word keywords, check if any variations exist
                    variations = [
                        '-'.join(keyword_parts),
                        '/'.join(keyword_parts),
                        ''.join(keyword_parts)
                    ]
                    found = False
                    for var in variations:
                        if var.lower() in clean_resume:
                            matched.append(keyword)
                            found = True
                            break
                            
                    # Also check if all parts exist independently (within reasonable distance)
                    if not found:
                        all_parts_exist = True
                        for part in keyword_parts:
                            if part not in stop_words and part not in clean_resume:
                                all_parts_exist = False
                                break
                        
                        if all_parts_exist:
                            matched.append(keyword)
                            found = True
                            
                    if not found:
                        missing.append(keyword)
                else:
                    # For single words, check for similar forms using stemming/lemmatization logic
                    similar_forms = {
                        'manage': ['manager', 'management', 'managing'],
                        'develop': ['developer', 'development', 'developing'],
                        'analyze': ['analyst', 'analysis', 'analytical'],
                        'finance': ['financial', 'financing'],
                        'strategy': ['strategic', 'strategist'],
                        'lead': ['leader', 'leadership']
                    }
                    
                    found = False
                    for base, forms in similar_forms.items():
                        if keyword in forms or keyword == base:
                            for form in forms + [base]:
                                if form in clean_resume:
                                    matched.append(keyword)
                                    found = True
                                    break
                        if found:
                            break
                            
                    if not found:
                        missing.append(keyword)
        
        # Calculate match percentage
        match_percentage = (len(matched) / max(len(important_keywords), 1)) * 100
        
        return {
            "match_percentage": round(match_percentage, 2),
            "matched": matched,
            "missing": missing,
            "important_keywords": important_keywords  # Return this for debugging
        }

I also wrote about the ethical aspects of AI implementation in the hiring process alongside a suggested framework for organizations and businesses to adopt.

Back to Projects

PDF Full Text

Other Projects

Product Management

Design

Research

Loop by Cuemill

Product Management

Design

Research

Loop by Cuemill

Product Management

Design

Research

Loop by Cuemill

Machine Learning

Data Science

Python

Utilizing NLP Techniques to Predict E-commerce Recommendation & Rating

Machine Learning

Data Science

Python

Utilizing NLP Techniques to Predict E-commerce Recommendation & Rating

Machine Learning

Data Science

Python

Utilizing NLP Techniques to Predict E-commerce Recommendation & Rating

All Projects

Home

About

Projects

The Product Hub

Resume

Home

About

Projects

The Product Hub

Resume

Resume GPT

Other Projects

Loop by Cuemill

Loop by Cuemill

Loop by Cuemill

Utilizing NLP Techniques to Predict E-commerce Recommendation & Rating

Utilizing NLP Techniques to Predict E-commerce Recommendation & Rating

Utilizing NLP Techniques to Predict E-commerce Recommendation & Rating

Let's Talk

Let's Talk

Let's Talk