Wednesday, July 6, 2022

Export Website as PDF - Using Headless browser

 There are many way we can download website html in C# and export it as PDF or however we wanted. But most of these ways not having option to render complete JavaSript and render the page fully.


Option 1: 

Simply use WebClient and export website into html then to PDF

using (WebClient client = new WebClient())

    {

        byte[] websiteData = client.DownloadData("somewebsiteurl");

        File.WriteAllBytes(“savepath”, websiteData);

        //do further steps to convert html to pdf

    }


Option 2: 

Use Headless browser with Puppetter to export as PDF from URL, here you can add more wait handlers to render the JavaScript

Ref: 

https://developer.chrome.com/docs/puppeteer/

https://www.puppeteersharp.com/  


Using JS: 

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  await page.screenshot({ path: 'example.png' });

  await browser.close();
})();


Using C#: 


using var browserFetcher = new BrowserFetcher();

await browserFetcher.DownloadAsync();

await using var browser = await Puppeteer.LaunchAsync(

    new LaunchOptions { Headless = true });

await using var page = await browser.NewPageAsync();

await page.GoToAsync("http://www.google.com");

await page.ScreenshotAsync(outputFile);


Note: we can also use installed browsers for exporting website as below



Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions

{

Headless = true,

Args = "{ "--disable-features=site-per-process", "--disable-web-security" }",

ExecutablePath = “any chromimum browser exe path”

});