Filling login forms automatically

buen articulo

The Scrapinghub Blog

We often have to write spiders that need to login to sites, in order to scrape data from them. Our customers provide us with the site, username and password, and we do the rest.

The classic way to approach this problem is:

  1. launch a browser, go to site and search for the login page
  2. inspect the source code of the page to find out:
    1. which one is the login form (a page can have many forms, but usually one of them is the login form)
    2. which are the field names used for username and password (these could vary a lot)
    3. if there are other fields that must be submitted (like an authentication token)
  3. write the Scrapy spider to replicate the form submission using FormRequest (here is an example)

Being fans of automation, we figured we could write some code to automate point 2 (which is actually the most time-consuming)…

Ver la entrada original 141 palabras más

Anuncios
Publicado en Uncategorized

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s

A %d blogueros les gusta esto: