Scraping is the process of downloading basic information such as director, actors, descriptions, and cover file from relevant websites based on the filename/code/title of a file.
Different file types theoretically require different scraping solutions. Currently, PLM only supports scraping for movies and Japanese AV videos, with limited support for books and music (using Douban).
The principle of scraping is to use the title/code of a file to search specified websites, extract and interpret the resulting page information, and save it in the file record. Because of this, many websites, such as JavDB, limit access to IPs that query too frequently to prevent scraping. While PLM does not access these websites with the intention of scraping, its behavior is similar. Therefore, if users access websites with anti-scraping mechanisms in large quantities within a short period of time using PLM, they may encounter time-limited access restrictions or even be blacklisted. It is strongly recommended that users do not batch scrape too many records per session/day and consider using a VPN for assistance: after completing a batch of scraping, use a VPN to connect to different servers to obtain different IPs before proceeding to the next batch.
Some websites (e.g., JavLibrary and JavBus) require users to manually verify their age or agree to terms during the first scraping session or even for each batch (e.g., JavLibrary). You can click the "Website" button when selecting a scraper to perform the consent action. PLM will also attempt to automatically determine whether manual intervention is required.
Given that many downloaded movie filenames contain various tags, it is recommended to use the AI title acquisition operation to clean these filenames and obtain reasonable movie titles. PLM will prioritize using the title field content for scraping, followed by the filename.
After scraping is complete, if the scraper did not use the appropriate language, it is recommended to edit the file content and use the translate button to translate the descriptions, actors, director, tags, etc.
During scraping, a list of scrapers will first pop up for the user to select. Multiple scrapers can be selected simultaneously (use the Up/Down buttons to adjust the order). If the first scraper fails or does not find the information, PLM will sequentially use the other selected scrapers. If a profile exists, you can select the profile in the lower right corner of the dialog box for quick selection. You can click the "Website" button to visit the website or perform consent actions.
Users can write their own scripts for scraping. Refer to the built-in examples in $InstallationFolder\scraper\javdb.pas, javhub.pas, etc.
Currently supported scrapers in PLM:
If a file has already been scraped, PLM will not perform actual scraping operations on it unless "Force Redo" is checked when selecting the "Scrape" action, or the relevant data is cleared by clicking the "Clear Info" button when editing the file.