Guide to create a web scrapper using bubble.io and puppeteer
Integrating code with no-code tool Bubble.io - sounds contradictory? Take a look at how we developed a Scrapper using Bubble, Node.js and Ngrok.
Waleed Mudassar
May 17, 2022
Used tools explained shortly:
Bubble.io
The whole user experience and interaction of the application are based on bubble.io. The Low Code platform provides visual elements to create the user interface, workflows to handle user inputs, and a database to store data like the scraped data from an eCommerce site.
Bubble.io Plugin
When we hit bubble.io’s limits, we can extend it. One way is by developing a plugin. Within the plugins, you can execute custom code or create your own visual elements for the user interface. We’ll be using an API connector – plugin provided by Bubble.
Node
Node.js is a single-threaded, open-source, cross-platform runtime environment for building fast and scalable server-side and networking applications. It runs on the V8 JavaScript runtime engine, and it uses event-driven, non-blocking I/O architecture, which makes it efficient and suitable for real-time applications.
Puppeteer
Puppeteer is a Node library that provides a high-level API to control headless Chrome or Chromium browsers over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.
Express
Express is a minimal and flexible Node.js web application framework that allows setting up middlewares to respond to HTTP Requests and defines a routing table which is used to perform different actions based on HTTP Method and URL
Ngrok
Ngrok is a cross-platform application that enables developers to expose a local development server to the Internet with minimal effort. The software makes your locally-hosted web server appear to be hosted on a subdomain of ngrok.com, meaning that no public IP or domain name on the local machine is needed
VS Code
Visual Studio Code is a streamlined code editor with support for development operations like debugging, task running, and version control. It aims to provide just the tools a developer needs for a quick code-build-debug cycle and leaves more complex workflows to fuller featured IDEs, such as Visual Studio IDE
Pre-requisites
Download and Install
VS Code ( https://code.visualstudio.com/download )
Goto terminal and write “npm init -y”, press enter
Install puppeteer using command “npm install puppeteer”, press enter
Install express using the command “npm install express”, press enter
Lets code
Open file “index.js”
Import the Puppeteer module within the “index.js” file
const puppeteer = require(‘puppeteer’);
Import the Express framework within the “index.js” file
const express = require(‘express’);
Instantiate the Express app
const app = express();
Set our port:
const port = 3000;
The port will be used a bit later when we tell the app to listen to requests.
Finalized selectors
Web Scraper uses CSS selectors to find HTML elements in web pages and extract data from them. When selecting an element the Web Scraper will try to make its best guess of what the CSS selector might be for the selected elements. But you can also write it yourself and test it by clicking “Element preview”.
Empty JSON object to send to bubble later, when data has been scrapped and stored into this JSON object
let productDetail = {
name:”,
price:”
}
We need to keep in mind that Puppeteer is a promise-based library: It performs asynchronous calls to the headless Chrome instance under the hood. Let’s keep the code clean by using async/await. For that, we need to define an async function and put all the Puppeteer code in there.
Define HTTP Get endpoint to accept requests from bubble server
When a user hits the endpoint with a GET request, the JSON object, from express” will be returned to the bubble application. We’d like to set it to be on the product page, so the URL for the endpoint is /product:
app.get(‘/product’, async (req, res) => {
Launch the browser
const browser = await puppeteer.launch()
Open a new tab
const page = await browser.newPage()
Puppeteer has a newPage() method that creates a new page instance in the browser, and these page instances can do quite a few things. In our scraper() method, you created a page instance and then used the page.goto() method to navigate to the target site
Pass URL of target site
await page.goto(req.query.url)
Save scraped data from HTML element’s selector ( name of product ) into JSON object