編寫爬蟲過程中遇到數據加密或者JS混淆的情況,導致無法獲取明文,現通過實例進行JS逆向獲取明文信息;
現通過搜索返回內容的關鍵字“encrypt_data”查詢前端的關鍵函數;
通過debug找到Z函數
function Z(f) {
return JSON.parse(W("sjdqmp20161205#_316@gfmt", J.decode(f), 0, 0, "012345677890123", 1))
}
Z函數中存在W函數和J函數,在對其進行進一步分析調試
W函數如下
function W(f, c, s, v, C, h) {
var m = new Array(16843776,0,65536,16843780,16842756,66564,4,65536,1024,16843776,16843780,1024,16778244,16842756,16777216,4,1028,16778240,16778240
//太多了,省略…
y = y.replace(/\0*$/g, ""),
!s) {
if (h === 1) {
var E = y.length
, O = 0;
E && (O = y.charCodeAt(E - 1)),
O <= 8 && (y = y.substring(0, E - O))
}
y = decodeURIComponent(escape(y))
}
return y
}
J.decode方法
function(t) {
t = String(t).replace(R, "");
var n = t.length;
n % 4 == 0 && (t = t.replace(/==?$/, ""),
n = t.length),
(n % 4 == 1 || /[^+a-zA-Z0-9/]/.test(t)) && F("Invalid character: the string to be decoded is not correctly encoded.");
for (var i = 0, A, r, p = "", d = -1; ++d < n; )
r = w.indexOf(t.charAt(d)),
A = i % 4 ? A * 64 + r : r,
i++ % 4 && (p += String.fromCharCode(255 & A >> (-2 * i & 6)));
return p
}
以上就是前端數據混淆的所有函數、方法,現在將函數寫入js文件,
通過nodejs執行js文件成功解密出明文;
自動化批量查詢
使用Python獲取密文,并通過Python執行JS文件并傳入密文獲取結果
#/usr/bin/env python3
#coding:utf-8
import requests
import execjs
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:128.0) Gecko/20100101 Firefox/128.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Priority': 'u=0, i',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
html = requests.get(url, headers=headers).json()
decodeData = execjs.compile(open("./1.js", "r", encoding="utf-8").read()).call('Z',html['encrypt_data'])
print(decodeData)