When (integration) testing scrapers you need to strike a balance between “get as much input data as possible to cover parsing edge-cases” and “don’t DoS the backend, please”. Caching can help with that, and gives you a nice performance boost during testing on top.
The OkHttp http client library contains a cache built-in but by default it follows HTTP server caching headers. It also allows you to set a particular request to always use the cache, and error if the request isn’t present yet, but that is too harsh (we do want to fetch the first time after all). Turns out you can do both:
val cachedHttpClient = OkHttpClient.Builder() .cache(Cache(File(".download-cache/okhttp"), 512 * 1024 * 1024)) .addInterceptor { chain -> chain.proceed( chain.request().newBuilder() .cacheControl(CacheControl.Builder() .maxStale(Integer.MAX_VALUE, TimeUnit.SECONDS) .build()) .build() ) } .addNetworkInterceptor { chain -> log.info("Fetching {}", chain.request().toString()) chain.proceed(chain.request()) } .build()!!
This is kotlin code, but you get the point.