Can someone help me scrape links from the Microchip website?

Started by JohnB, Mar 19, 2021, 11:43 AM

Previous topic - Next topic

JohnB

I would like some help to programmatically request a specific device data sheet from the Microchip website.

https://www.microchip.com/doclisting/TechDoc.aspx?type=datasheet

I don't know javascript and I can't decipher from the developers page in Google Chrome where I should look.
Ultimately I want to make the call from a Delphi program.  Are there any wizards out there who could help?


Thanks in hope...

JohnB
 
JohnB

DaveS

Hi John

I had to parse a downloaded comma delimited file to make linker files for PdfNow but Mchip keep changing things around and that option is no longer available.
But I can see from the link you supplied it is possible to get the filename download address, I did search by product and entered pic10 etc and saved the web page, you could parse that.
The download code that PdfNow uses was used in TripLogik for downloading the help files, you can use that.

Regards
Dave

JohnB

What I cannot fathom out is how to complete the fields to make the call from code.  When I opened up the page using the developer tools in Chrome I it looks like one great big XML file with calls loads of java text.

I can easily handle the download bit its addressing that code from within Delphi I am at a loss with.  I will try with one of the indy vcls that come with Delphi.
JohnB

Stephen Moss

I don't know if this is what you are looking for, but f Delphi has a Web Browser tool you need to search the webpage source code for ID. For example in the link you provided the ID for the first item listed the table data (1000C-OCXO) the table row data is...<div>
                <div id="ctl00_MainContent_ListViewDocList1_ctrl0_Product_column">
                    1000C-OCXO
                </div>
                <div>
                    <a href="https://ww1.microchip.com/downloads/en/DeviceDoc/DS_1000C.pdf" id="ctl00_MainContent_ListViewDocList1_ctrl0_hrefLink" target="_blank">
                        <span id="ctl00_MainContent_ListViewDocList1_ctrl0_lbldispTitle">1000C OCXO Datasheet</span></a>

                </div>
                <div>
                    <span id="ctl00_MainContent_ListViewDocList1_ctrl0_lblpublishdateDisplay">06-May-2020</span>
                </div>
            </div>
and the ID for the datasheet link would beid="ctl00_MainContent_ListViewDocList1_ctrl0_hrefLink"
You would need to select that and perform a click event, I am not sure how you would do that in delphi but in Visual Studio that uses Web Browser tool it would be WebBrowser1.Document.GetElementById("submit").InvokeMember("click") In this example the ID is for a submit button, the InvokeMember("click") instigates the mouse click. I have not tried it but presumable if you replaced "sumbit" with the id I indicated above that would replicate a click on the link resulting in the file being opening in the web browser from where it could either be read or saved.

Alternatively you could just extract the datasheets url (https://ww1.microchip.com/downloads/en/DeviceDoc/DS_1000C.pdf) and enter that into the web browser (i.e. WebBrowser1.Navigate(https://ww1.microchip.com/downloads/en/DeviceDoc/DS_1000C.pdf) to open the pdf in the Web Browser from where it could either be read or saved.

Although it may be better to combine the two methods, use the first to set the search type to "Document Title", fill in the text box for the device to search for and click the search button to narrow down the list of datasheets, then extract the datasheets url from there. If you did that in a loop for all devices that have a PPI file you could save all the URLs in a file for future reference.

Unfortunately, I threw out my java book as there may have been something useful in that despite it being at least 10 year old, maybe there is a command in Delphi such as web_client.downloadfile(insert_url_here).

JohnB

Dave S has helped me and I now have a solution.  Thanks Stephen
JohnB