Pawel's Website

Technology, Life

Data searching and news reading without tracking and profiling

Posted at — Sep 1, 2020

It’s undeniable fact that big market players like Google and Microsoft have their services almost everywhere on the Internet starting from search engines, going through mail services and ending on news agregators. They are accesable, easy to use and for free but are they really for free? Not really, your privacy is the price. Those companies collect data about what you are looking for on the Internet, they have access to your e-mails and unfortunatelly they provide news in their services not in random way but using your digital profile, yes they profile you and provide content they want you to see.
If above work for you then go ahead and be as much productive as possible and get job done. If you care about how your digital footprint looks like then maybe it’s worth to consider some steps to protect your privacy and in the same time keep beeing productive.

Search engines

Search engines like DuckDuckGo, Startpage, Qwant can be an answer for you. Why not have all of them and more in one solution? Some time ago I came across on something like Searx and it turned out is exactly what I was looking for. It agregates many search engines, doesn’t profile you, all request you send through Searx are visible for end search engines as a request from some IP address but it’s not signed with your name or data that can directly identify it’s you. Can it be better? Yes, you can host it on your own server and then your are the only person who see inbound and outbound traffic. For search engines you are an IP address and that’s it. There is many public Searx instances but I’ve decided to have my own mostly from educational point of view, I like to understand how things work from scratch.

I’ve decided to choose installation path with full proxification including Filtron and Morty as desribed here considering Nginx setup. Apache path is very similar, you need to have apache installed, apache certbot plugin and web config similar to searx.conf (a few paragraphs below) with proper apache syntax.

Installation on Debian 10 and Ubuntu 18.04 and higher is on high level as following:

Prerequisites:

  1. Nginx
  2. Certbot
  3. python-certbot-nginx or python3-certbot-nginx (package depends on version of Debian or Ubuntu) necessary for Let’s Encrypt SSL certificate generation.

You need to be root or user with sudo ACL.

Step 1

git clone https://github.com/searx/searx searx
cd searx

Step 2

sudo -H ./utils/searx.sh install all

Step 3

sudo -H ./utils/filtron.sh install all

Step 4

sudo -H ./utils/morty.sh install all

Then according to this Nginx setup create Nginx config file in /etc/nginx/sites-available/ and link to /etc/nginx/sites-enabled/

searx.conf content

server {
        listen 80;
        server_name example.com;

        location / {
            proxy_pass http://127.0.0.1:8888;
        
            proxy_set_header Host $host;
            proxy_set_header Connection       $http_connection;
            proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for;
            proxy_set_header X-Scheme         $scheme;
            proxy_buffering                   off;
        }
        
        location /searx {
            proxy_pass         http://127.0.0.1:4004/;
        
            proxy_set_header   Host             $http_host;
            proxy_set_header   Connection       $http_connection;
            proxy_set_header   X-Real-IP        $remote_addr;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
            proxy_set_header   X-Scheme         $scheme;
            proxy_set_header   X-Script-Name    /searx;
        }
        
        location /searx/static {
            alias /usr/local/searx/searx-src/searx/static;
        }
        
        location /morty {
            proxy_pass         http://127.0.0.1:3000/;
        
            proxy_set_header   Host             $http_host;
            proxy_set_header   Connection       $http_connection;
            proxy_set_header   X-Real-IP        $remote_addr;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
            proxy_set_header   X-Scheme         $scheme;
        }
}

In order to use encrypted https connetion on Searx instance use:

sudo certbot --nginx

Searx customization

in /etc/searx/settings.yml

#In order to enable autocomplete below option needs to be fill in
#Startpage is only an example
autocomplete : "startpage"

#Change theme to dark
oscar_style : logicodev-dark

On server section, image_proxy I set False as when it was True sometimes images didn’t load on result page.

#In section:
server:
    image_proxy : False

On result_proxy section I had to change default url from http to https, without this some search results were causing lack of encryption on rendered pages.

#In section:
result_proxy:
    url: https://ip_address/morty/

I haven’t gone into much details of other Searx configuration but I think that above should be good starting point for interested.

News reading

I use very simple approach to get news I want to read without risk of beeing distracted and reading articles I didn’t want to see. Most of websites and more complex services provide data feed using RSS technology. It’s a way how people were comsuming news many years ago and some big part of society still do.

I use RSS reader based on choosen links from different pages that I’m interested in and I get only updates from those sources without anything else that “attack” people today on the Internet.

Even if you can’t live without platforms like Facebook, Twitter or Reddit you can still get news using RSS feed from them without using them directly. One more thing, when you use terminal based RSS reader like I do then it’s even less distracting and healthy for your eyes, believe me, juts try it.

It’s not worth to waste your time on browsing everything and closing unnecesary pop-ups and windows all the time, choose whatever is important for you and get your news in a way it’s been invented long time ago and is still valid not without reason.

by Pawel Zelawski