Sabana, CRI
1 day ago
Web Scraping Architect

We are seeking an experienced and forward-thinking Web Scraping Architect to lead the technical design and strategy of our data acquisition solutions. This role is a natural progression for a senior developer who excels at solving complex data extraction challenges and is ready to take on greater responsibility for system design and technical guidance. You will be instrumental in evolving our scraping capabilities, ensuring the solutions built by our Python and Java teams are scalable, resilient, and efficient.

As a key technical leader, you will bridge the gap between high-level strategy and practical implementation. You will guide development teams by creating robust architectural patterns, establishing best practices, and making critical technology recommendations. This is a hands-on leadership role focused on elevating our technical excellence and ensuring our data acquisition platform can meet the future needs of the business.

What You'll Do

Lead Technical Design: Guide the architecture of our web scraping projects in both Python and Java. Create and document scalable, reusable components and patterns for the development teams to implement.

Establish Best Practices: Define and champion the technical standards for data acquisition. This includes creating guidelines for code quality, data validation, error handling, and efficient resource management.

Solve Complex Scraping Challenges: Serve as the go-to expert for the toughest data extraction problems. Analyze complex websites, evaluate anti-scraping technologies, and design effective, resilient solutions.

Evaluate Tools and Technologies: Research, prototype, and recommend new tools, libraries, and techniques to keep our scraping stack modern and effective. This includes evaluating proxy services, browser automation tools, and data parsing libraries.

Collaborate on Strategy: Work closely with product managers and data analysts to understand business requirements and translate them into clear technical designs and actionable development tasks.

Mentor Engineering Teams: Provide technical mentorship to senior developers, helping them grow their skills in system design, scalability, and advanced scraping techniques.    

Promote Ethical Scraping: Ensure all data acquisition activities align with ethical best practices and legal guidelines by embedding principles from robots.txt and Terms of Service into our technical designs.

What experience do you need:

A Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.

7+ years of professional software engineering experience, with at least 3+ years focused on building and maintaining large-scale web scraping solutions.    

Polyglot Scraping Frameworks: Deep practical expertise in designing solutions with core scraping frameworks in both Python (e.g., Scrapy, Playwright) and Java (e.g., Jsoup, Selenium, Crawler4j). You should be able to guide teams on when to use a full browser automation tool versus a lighter-weight HTTP client.

Scalable System Architecture: Proven experience designing scalable data acquisition platforms. This includes hands-on knowledge of containerization with Docker, distributing tasks with job queuing systems (e.g., Kafka, SQS), and managing complex workflows with orchestration tools like Apache Airflow.    

API Discovery and Evasion: Demonstrated ability to analyze website behavior to find efficient data extraction points. This includes proficiency with Browser Developer Tools (Network Tab) and intercepting proxies (e.g., mitmproxy) to reverse-engineer private APIs and overcome common anti-bot measures.

English proficiency B2 or above.

What Could Set You Apart

A professional cloud architecture certification, such as AWS Certified Solutions Architect - Professional or Google Cloud Professional Cloud Architect.

Experience with workflow orchestration tools like Apache Airflow or Prefect.    

Familiarity with containerization technologies such as Docker and orchestration systems like Kubernetes.    

Practical experience using cloud platforms (AWS, GCP, Azure) for deploying and managing scraping infrastructure.    

Knowledge of Infrastructure as Code (IaC) tools like Terraform.

Basic understanding of how to use browser developer tools to inspect network traffic and identify potential API endpoints for data extraction.    

Previous experience contributing to technical documentation or mentoring other developers.

Primary Location:

CRI-Sabana

Function:

Function - Tech Dev and Client Services

Schedule:

Full time
Confirm your E-mail: Send Email