Create a web scraper using bubble.io and puppeteer

Guide to create a web scrapper using bubble.io and puppeteer

Integrating code with no-code tool Bubble.io - sounds contradictory? Take a look at how we developed a Scrapper using Bubble, Node.js and Ngrok.

Waleed Mudassar

May 17, 2022

Used tools explained shortly:

Bubble.io

The whole user experience and interaction of the application are based on bubble.io. The Low Code platform provides visual elements to create the user interface, workflows to handle user inputs, and a database to store data like the scraped data from an eCommerce site.

Bubble.io Plugin

When we hit bubble.io’s limits, we can extend it. One way is by developing a plugin. Within the plugins, you can execute custom code or create your own visual elements for the user interface. We’ll be using an API connector – plugin provided by Bubble.

Node

Node.js is a single-threaded, open-source, cross-platform runtime environment for building fast and scalable server-side and networking applications. It runs on the V8 JavaScript runtime engine, and it uses event-driven, non-blocking I/O architecture, which makes it efficient and suitable for real-time applications.

Puppeteer

Puppeteer is a Node library that provides a high-level API to control headless Chrome or Chromium browsers over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.

Express

Express is a minimal and flexible Node.js web application framework that allows setting up middlewares to respond to HTTP Requests and defines a routing table which is used to perform different actions based on HTTP Method and URL

Ngrok

Ngrok is a cross-platform application that enables developers to expose a local development server to the Internet with minimal effort. The software makes your locally-hosted web server appear to be hosted on a subdomain of ngrok.com, meaning that no public IP or domain name on the local machine is needed

VS Code

Visual Studio Code is a streamlined code editor with support for development operations like debugging, task running, and version control. It aims to provide just the tools a developer needs for a quick code-build-debug cycle and leaves more complex workflows to fuller featured IDEs, such as Visual Studio IDE

Pre-requisites

Download and Install

VS Code ( https://code.visualstudio.com/download )
Node ( https://nodejs.org/en/download/ )
Ngrok ( https://ngrok.com/download ) –
Tutorial Video: How to access localhost anywhere with ngrok

Environment Setup for web scrapper

Create a folder on the desktop named “scraper”
Open VS Code
Click on File > Open Folder
Locate and select folder “scraper”
Click on “Select Folder”
Create a file named “index.js”
Goto terminal and write “npm init -y”, press enter
Install puppeteer using command “npm install puppeteer”, press enter
Install express using the command “npm install express”, press enter

Lets code

Open file “index.js”
Import the Puppeteer module within the “index.js” file

const puppeteer = require(‘puppeteer’);

Import the Express framework within the “index.js” file

const express = require(‘express’);

Instantiate the Express app

const app = express();

Set our port:

const port = 3000;

The port will be used a bit later when we tell the app to listen to requests.

Finalized selectors

Web Scraper uses CSS selectors to find HTML elements in web pages and extract data from them. When selecting an element the Web Scraper will try to make its best guess of what the CSS selector might be for the selected elements. But you can also write it yourself and test it by clicking “Element preview”.

Selectors = {

name:’.prod-subtitle’,

price:’span.push-right:nth-child(1) > strong:nth-child(1)’

}

Empty JSON object to send to bubble later, when data has been scrapped and stored into this JSON object

let productDetail = {

name:”,

price:”

}

We need to keep in mind that Puppeteer is a promise-based library: It performs asynchronous calls to the headless Chrome instance under the hood. Let’s keep the code clean by using async/await. For that, we need to define an async function and put all the Puppeteer code in there.

Define HTTP Get endpoint to accept requests from bubble server

When a user hits the endpoint with a GET request, the JSON object, from express” will be returned to the bubble application. We’d like to set it to be on the product page, so the URL for the endpoint is /product:

app.get(‘/product’, async (req, res) => {

Launch the browser

const browser = await puppeteer.launch()

Open a new tab

const page = await browser.newPage()

Puppeteer has a newPage() method that creates a new page instance in the browser, and these page instances can do quite a few things. In our scraper() method, you created a page instance and then used the page.goto() method to navigate to the target site

Pass URL of target site

await page.goto(req.query.url)

Save scraped data from HTML element’s selector ( name of product ) into JSON object

productDetail.name = await page.$eval(Selectors.name, el=>el.textContent)

Save scraped data from HTML element’s selector ( price of product ) into JSON object

price_ = await page.$eval(Selectors.price, el=>el.textContent)

“price_” is a temporary variable declared anywhere, above this line of code.

Here, we need to clean data ( price ), as price contains ‘,’ and ‘.’ swapped.

This is not a mandatory case, but necessary here.

lett = 0;

price_ = price_.replace(/,/g, match => ++t === 2 ? ‘.’ : match)
productDetail.price = price_;

Close browser

await browser.close()

Send JSON object to bubble as a response

res.json(productDetail);

let’s start with our clients

app.listen(port, () => {

console.log(`Example app listening on port ${port}`)

})

To run the application, open terminal and write command “node index.js” and press enter

Complete Code

Copy

Hosting script on server (Ngrok)

Signup to Ngrok

Go to “Your Authtoken”
Copy Token
Open Ngrok, type ngrok authtoken [Paste your Token]

ngrok authtoken 25pMZXFc3gQ3KAYhhclTu41LcS0_3u24V5PXcQUzLgQA29ApA

To expose a web server running on your local machine to the internet, type ngrok HTTP [port number] – in this case, port number 3000

ngrok http 3000

Now that our script is ready and has been hosted on the server. Let’s design UI on Bubble, and test requests via Bubble API Connector.

Designing UI on Bubble.io

Create a page, and put Input Field and Button on the page.
The user will paste a link from the eCommerce site into the Input Field.

Get a plugin named “API Connector”
Set API Name “Puppeteer Scraper API”
Set Authentication “None or self-handled”
Create a call and set the name “GET”
Set Use as “Action”
After pasting the link, the user will click on the “Calculate” button. And the following Event will get initiated with respective actions: