I apologize in advance for my bad english.
I’ve created simple training service in golang, which supports login and registration system with MongoDB. This service allows you to scrape rooms for rent in London in specified location if you loggedin. So, now I want to implement notifications for loggedin user’s about new rooms in user’s marked location. My first idea was to make some background process, which will scrape rooms every 30 seconds, save the results (in mongo, in cookies or somewhere else, advise me please), match new scrape results with previous and save differences (new rooms) in DB for future posting to user in some form (email or list on html page).
Is my idea about notifications generally correct? If not, please describe me better way to do this or point to some relating examples.
What is the best way to make that background process in go?
This would be great if you’d point me on some examples relating to the case.
Demo of service on Heroku
I appreciate your concern.
I don’t have any examples for what you are currently asking, however and I’m sure someone will correct me if I’m wrong, but I personally think it would be fine to poll for new rooms.
Once a new room is found, you could store the results in the database and any time users check out new pages, just load it from there.
In between scraping though, it might make more sense to keep the new rooms in a cache so the db isn’t constantly hit during updates.
Either way, maybe someone else has some good suggestions, but for now, it seems ok to me
As for starting the background process, before you start your server, just run a goroutine that does the polling and db/cache updates.
Also discussed on golang-nuts.
@jopoleon I think it’s best to rethink your current implementation. As @radovskyb said you should be continuously scraping your source at a interval that matches the sources service agreement(email them about access rates if they don’t already provide an API). If you’re doing a full client-server-source request for every check you are going to run into a few issues in the long term,
- Source may block you due to abuse if you don’t rate limit
- You can be DDoS’d as all your connections will have high latency any memory they use will persist until the connection closes or timeout
- Availability of your site is determined by the availability of the sites being scraped, again you can use timeouts, but if there’s 10 sources to check you are going to be very hard pressed getting all 10 responses without annoying your end users.
As for implementing this, just spawn a goroutine before setting up your http endpoints, use context.Context to setup timeouts and signaling. Example here: https://gist.github.com/thetooth/06f6952345dd357b3ca22c3327f57e87 it will continually call “fetch” until the parent cancels the context. Each call to fetch has a timeout, when the timeout passes we return an error.
That should get you started.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.