Skip to main content

Python Beautifulsoup package

BeautifulSoup is an excellent python library to extract data from HTML and XML files with various parsers.

Extract the csrf token

Extract the csrf token assuming the csrf token is assigned to name attribute in the input element

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Test doc</title>
</head>
<body>
<h1>Login</h1>
<section>
<form class="login-form" method="POST" action="/login">
<input
required
type="hidden"
name="csrf"
value="23lskjfklskjfajsdlfadflkjwrewore"
/>
<label>Username</label>
<input required type="username" name="username" autofocus />
<label>Password</label>
<input required type="password" name="password" />
<button class="button" type="submit">Log in</button>
</form>
</section>
</body>
</html>
from bs4 import BeautifulSoup
import requests

resp = requests.get("http://localhost:8080/index.html",verify=False)
soup = BeautifulSoup(resp.text, "html.parser")
csrf_token = soup.find("input",{"name": "csrf"})["value"]
print(f"[+] csrf token: {csrf_token}")