The Huginn and Kubernetes Logos.

In the previous article, we used Kubernetes to create the Huginn server and its database. In this article, I'm going to show you

  1. How to use Huginn to search for apartments on Craigslist
  2. And how to host Huginn and Nextcloud (from the article, Self-Host Nextcloud Using Kubernetes) on the same cluster using an Ingress object

Source Code (Optional)

If you want to get your hands on the code, click on the Github link below. Be sure to git checkout the huginn tag to view the finished code. Note that in future posts, I will add  additional projects to the repo and making changes that aren't reflected  in this post.

nick-true-dev/usable-k8s-projects
Three projects that showcase Kubernetes. Project 1 uses K8s to host a Nextcloud server. Project 2 uses K8s to host a Huginn server. And project 3 uses K8s to host a MonicaHQ server. - nick-true-dev...

Expose Our Huginn Server Using an Ingress Object

Previously, we created the Huginn server and database. Now we should test out the website. However, the server is inaccessible to requests from outside of the cluster. We are going to give access to the server by way of a Kubernetes Ingress object.

Here is the ingress config file from the article Self-Host Nextcloud Using Kubernetes. We used it to make the Nextcloud server accessible outside of the cluster.

cluster-ingress-1

Add the code in the red box to make the Huginn server externally accessible as well.

cluster-ingress-1a

Here's what that does. All requests sent to agents.mysite.test/ will get forwarded by the ingress to the huginn-server service at port 3000. (This assumes that agents.mysite.test resolves to the IP address of our Minikube cluster.)

cluster-ingress-1b

Name-Based Virtual Hosting

Broadly speaking, packets get routed to their destination using the recipient's IP address. But this scheme prevents multiple client-applications from sharing the same IP and listening on the same port. For instance, you can't have two web servers on the same machine that both listen on port 80 for HTTP requests.

Name-based virtual hosting solves this problem. It does so by routing messages to the intended client program using the hostname embedded in the HTTP request headers.

Here's the point. Kubernetes ingress objects provide name-based virtual hosting. Thus, you can host Huginn and Nextcloud on the same Minikube cluster!

Create the Ingress Object

In the previous post, I mentioned that Minikube does not have an ingress controller enabled by default. This is a behind-the-scenes component that our cluster must have in order for ingress objects to work. If you haven't already enable ingress on Minikube, let's do so now.

minikube addons enable ingress

Now that the ingress controller is enabled, let's create the ingress object.

kubectl apply -f cluster-ingress.yaml

Update the "hosts" File

In order to make agents.mysite.test resolve (on this computer) to the Minikube IP, we need to add the following entry to our hosts file.

agents.mysite.test  <the minikube ip>

So, get the IP address using the following command
term-5-1

Since I'm on a Mac, I edit the hosts file using the following command

sudo vim /etc/hosts

If you are unsure of how to do this on your computer, check out this article. In the end, just make sure that your hosts file looks somethink like this.
term-1

Now we're ready to visit our Huginn website!
huginn-site-1

Use Huginn to Automatically Search Craigslist for Apartments

Huginn is a tool of many talents. Here are some examples of what it can do for you. (Examples from https://github.com/huginn/huginn):

  • Follow your project names on Twitter and get updates when people mention them.
  • Scrape websites and receive an email when they change.
  • Track counts of high frequency events and send an SMS within moments when they spike, such as the term "San Francisco emergency".

What we are going to do is create a Huginn agent that periodically searches for apartments in San Francisco.

Sign Up with Huginn

  1. Click on the Sign up button on the Home page.

  2. Copy the invitation code from the configMap...

    huginn-server-1g
    ...and paste it into the Invitation code field. (The purpose of the invitation code is to help you prevent random people from signing up.)

    huginn-site-1a

  3. Finally, fill in the rest of the fields and click the Sign up button.

Create Our Craigslist Website-Scraping Agent

  1. Click on the Agents tab.

    huginn-site-1b

    You will see a page with seven demo agents. Their purpose is to give some examples of what agents look like. You can play around with them or delete them as you like.

    huginn-site-1c

  2. Click on the New Agent button.

  3. Select Website Agent from the dropdown list. This type of agent is perfect for scraping a website and creating Huginn events based on what it finds.

  4. Give it the name "Apartment Lookout". Set the schedule to "Every 10m". And keep events (generated by this agent) for "1 hour".

    huginn-site-1d

  5. Scroll further down until you see the JSON config code.

    huginn-site-1e

    This is the default Website Agent config. It's just there to give you an example of what such an agent should look like. We are going to change it.

  6. Click the Toggle View link...

    huginn-site-1f

    ...and paste the following agent code over the default agent code:

    {
      "expected_update_period_in_days": "2",
      "url": "TODO",
      "type": "html",
      "mode": "on_change",
      "extract": {
        "price": {
          "css": ".result-meta .result-price",
          "value": "string(.)"
        },
        "title": {
          "css": ".result-info .result-title",
          "value": "string(.)"
        },
        "url": {
          "css": ".result-info .result-title",
          "value": "@href"
        }
      }
    }
    
  7. We need to insert the URL of the site that we want to scrape.

    1. In a new browser tab, go to craigslist.

    2. Click on "apts/housing".

    3. Set the search criteria that interests you.

    4. Copy the URL.

      craigslist-1

    5. Go back to the Huginn page and paste over "TODO" in ..."url": "TODO"....

  8. Create the agent by clicking the Save button.

Note: The "Apartment Lookout" agent does not send notification emails. It creates an event containing the price, title, and URL of each matching listing that it finds.

Create the Notify-By-Email Agent

We need to create another Huginn agent which injests the events created by the Craigslist scraping agent. This new agent's job will be to send us a digest email notifying us of the latest matching listings.

  1. Click the New Agent button.

  2. Select Email Digest Agent from the dropdown.

  3. Name it "Apartment Digest Email".

  4. Schedule it to send the email every five minutes, if there are new listings to send.

  5. Set Keep events to "1 hour".

  6. Click on Sources and select "Apartment Lookout".

  7. Click on Toggle View.

    huginn-site-1g

  8. Paste the following JSON text over the default digest email agent.

     {
       "subject": "New apartment(s) posted!",
       "headline": "Craigslist Apartment Posts:",
       "expected_receive_period_in_days": "2"
     }
    
  9. Click the Save button to create the agent.

Ok. Our agents are ready to go!

Results

We could wait for the agents to run eventually. But why wait? Let's manually trigger them right now and test things out.

  1. Click on the Actions button for the Apartment Lookout agent, and click on the Run menu item.

    huginn-site-1h-1

  2. Click on the Actions button for the Apartment Digest Email agent, and click on the Run menu item to send the digest email.

    huginn-site-1i-3

  3. Check the email inbox. Note that you may have to check your spam folder.

    huginn-site-1j

    Here's the digest email in all its glory!

Summary

We covered a lot in these last two posts. You learned a bunch of things, such as

  • How to use a Kubernetes configMap to store non-sensitive environment variables
  • How to use a Kubernetes ingress object to enable your cluster to host multiple web servers at the same time (aka name-based virtual hosting)
  • How to use Amazon's Simple Email Service to send transactional emails
  • And how to use Huginn to periodically scrape a website

I hope you found these articles to be useful. Thanks for checking them out!