join the orcawise innovation program

The #1 work experience program for AI  

Web Data Engineer

Icon
Volunteer work experience
Icon
Last Date Of Application:
Apply Now

Job Title: Web Data Engineer (OIP – Volunteer Program)
Program: Orcawise Innovation Program (OIP) – 100-Day Work Experience
Location: Remote
Type: Volunteer | Training + Project Experience

About the Role

As a Web Data Engineer in the Orcawise Innovation Program, you’ll contribute to AI model development by building high-quality, ethically sourced web scraping pipelines. You will work directly with AI and product teams to extract structured data used in compliance-focused large language models — including tools aligned with the EU AI Act.

This role requires prior technical experience with Python-based scraping tools, APIs, and data handling libraries.

What You’ll Do

  • Develop, maintain, and scale web scraping scripts using Python
  • Automate data collection from complex websites (JavaScript-heavy or paginated content)
  • Structure and clean scraped data into JSON, CSV, or database-ready formats
  • Extract data via both HTML scraping and public APIs
  • Handle anti-bot measures (headers, user-agent rotation, delays, CAPTCHAs) responsibly
  • Work with LLM engineers to identify useful data sources and formats
  • Follow legal and ethical scraping guidelines

Required Technical Skills

You should be comfortable with the following tools and concepts before applying:

  • Python (intermediate or higher)
  • BeautifulSoup, Scrapy, or Selenium
  • API integration and request handling using requests, httpx, or similar
  • Regex and XPath for targeting data
  • Working with JSON, CSV, and Pandas for data transformation
  • Git (for version control and collaboration)
  • Awareness of robots.txt, rate-limiting, and scraping ethics

Nice to Have

  • Experience with browser automation tools (e.g. Playwright or Puppeteer via Python)
  • Knowledge of proxy management and user-agent rotation
  • Familiarity with Docker, Postgres, or cloud storage (S3)

What You’ll Gain

  • Real-world experience building scrapers for AI model development
  • Mentorship from Orcawise AI engineering team
  • Portfolio-ready projects in legal-tech and AI governance
  • Exposure to Responsible AI workflows and legal compliance requirements
  • Certificate of completion and career support post-program

⚠️ Eligibility Requirement

Applicants may be asked to complete a short scraping task as part of the selection process to demonstrate their technical ability and understanding of ethical data sourcing.

Job details

Experience :
3+ years
No Of Vacancies :
1 spot available
Working Hours :
4-hours per day
Salary :
Volunteer work experience
Working Days :
Monday - Friday
Apply Now
apply today

Join the #1 work experience program for AI professionals

Work in a live production environment under the guidance and direction of top mentors. Graduate with a Certificate of Completion and employer reference after you successfully complete 100-days work experience.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Are you ready to start your Responsible AI journey?