Python Beautifulsoup package

BeautifulSoup is an excellent python library to extract data from HTML and XML files with various parsers.

Extract the csrf token

Extract the csrf token assuming the csrf token is assigned to name attribute in the input element

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Test doc</title>
  </head>
  <body>
    <h1>Login</h1>
    <section>
      <form class="login-form" method="POST" action="/login">
        <input
          required
          type="hidden"
          name="csrf"
          value="23lskjfklskjfajsdlfadflkjwrewore"
        />
        <label>Username</label>
        <input required type="username" name="username" autofocus />
        <label>Password</label>
        <input required type="password" name="password" />
        <button class="button" type="submit">Log in</button>
      </form>
    </section>
  </body>
</html>

from bs4 import BeautifulSoup
import requests

resp = requests.get("http://localhost:8080/index.html",verify=False)
soup = BeautifulSoup(resp.text, "html.parser")
csrf_token = soup.find("input",{"name": "csrf"})["value"]
print(f"[+] csrf token: {csrf_token}")

# output
# [+] csrf token: 23lskjfklskjfajsdlfadflkjwrewore

Extract the csrf token​

Extract the csrf token