6 Replies Latest reply on Oct 2, 2017 11:27 AM by johnaldridge

    Perl LWP And HTTPS Through A Proxy—Why There Must be a CONNECT

    johnaldridge

      Anyone notice that searching for answers with HTTP command issues is a pain.  I'm pretty sure that "get" is a stop word (found it on at least two listings).  And, "connect" are just as bad in any discussion involving connections.

       

      I knew that when I started working on this problem that it was almost certainly a client problem, but pinning down exactly what standards would apply wasn't going to be easy.

       

      The end of the story is that the client uses Perl LWP, and it simply does a POST—no CONNECT.  The links below cover some detail on this, one of which says that the LWP was missing the necessary feature up until about 2013, which is pretty recent considering that I was sure I bought a book on LWP about a decade before (well, at least half a decade).

       

      https://stackoverflow.com/questions/12116244/https-proxy-and-lwpuseragent

      https://community.activestate.com/forum-topic/lwp-https-requests-proxy

      https://ddumont.wordpress.com/2013/11/02/about-lwpuseragent-https-and-proxy-setu p/

       

      I've already put McAfee support through their paces in the hope of getting a definitive answer.  What I offer here is an explanation of why it has to be this way.  In short, a proxy needs the CONNECT in order to get a complete socket pathway end-to-end before the SSL/TLS hellos are sent. 

       

      Next is a some notes I recorded in hacking my way through this:

       


       

      As a side point, I did some poking around after noticing that the user agent said the script was based on libwww-perl (LWP).  I wanted to see if I could work up a little code fragment (below) to replicate the problem.

       

      No go.  The code that I came up with first does a CONNECT before it does a POST, while your script never does a CONNECT, only a POST.

       

      $ perl -E 'use LWP; print "This is libwww-perl-$LWP::VERSION\n";'

      This is libwww-perl-6.26

       

      My version of LWP (under Cygwin) is slightly more recent than the libwww-perl/6.05 used in your script. You might want to check the version of Perl on your system and see if updates are available.

       

      I’ve passed the traces to McAfee, along with my comments of what I think they mean.

       

      My test fragment/scriptlet:

      $ perl -E '

      $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;

      use HTTP::Request::Common qw(POST);

      use LWP::UserAgent;

      $ua = LWP::UserAgent->new;

      $ua->env_proxy;

      my $req = POST "https://www.ultratools.com/tools/dnsLookupResult", [ domain => "stackoverflow.com" ];

      say $ua->request($req)->as_string;' | sed 's/<[^>]*>//g; /^[[:blank:]]*$/d; /span.rootColumn1/q;'

       

      This is a basic cookbook example, and the LWP has gobs of features.  Figuring out if this a version thing or if the author of the script bypasses the CONNECT feature will be a pain.

       


       

      Earlier on in the discussion I lay out a realization of why a POST without a CONNECT would be undesirable—including a hack that might suffice as a workaround, though it leaves the client side of the connection unencrypted:

       

      We managed to capture an interesting trace, and there are some technological details that I want to share with everybody.

       

      This turned out longer than I thought it would.  I guess I was in the mood for getting into the nuts and bolts.  Yet, it’s valuable knowledge for our troubleshooting chops.

       


        

      The way proxies use HTTP is a bit different than the way HTTP was originally designed.  The most commonly used HTTP command is GET, and the first argument that it takes is the path to a resource.  Not so with a proxy.  One way or another, a proxy needs the entire URL, so they decided to simply have clients send the entire URL in place of the path.  The same is true of a POST and a HEAD.

       

      Now consider HTTPS. The browser knows you want HTTPS from the URL.  But, when a client connects to a proxy, the proxy doesn’t yet have the URL.  So, the client must either always connect via HTTP or always via HTTPS (and we do HTTP, and I don’t happen to know if anyone does HTTPS).  So, with a client already connected via HTTP (no ‘S’), there is no signal to tell the client to switch to HTTPS (or so I’m speculating, as there is a STARTTLS for or such for SMTP and FTPS), so the connection must stay HTTP (again, no ‘S’).  That’s why the default ruleset won’t enable SSL/TLS inspection for anything but an HTTP CONNECT.  A CONNECT gives the proxy an opportunity to find the HTTPS before full URL’s and content are passed in the SSL/TLS tunnel. 

       

      Note that a CONNECT does not take a path; but for a proxy, it takes a URL without a path.  In the rule traces, those are the requests that do not end in “/” (a slash).

       

      So, how did I nail this down?

       


        

      In discussing the issue with McAfee, they were concerned the plain POST was a problem.  And, Gary Kemp of McAfee put it in my head that it might be interesting to do a trace with SSL inspection enabled.  With a bit of futzing and remembering the above details about HTTP through a proxy, I ended up creating a custom rule set tree in the test environment.  I essentially copied the existing rule-tree, stripped it down, and did a bit of editing.  It only works for one client IP address, it enables certificate verification for a POST (enabling the destination TLS configuration), and enables SSL inspection (with the client-side TLS configuration) for all requests.

       

      Normally, the SSL inspection is only enabled for a CERTVERIFY request.  But, there’s no such thing as a CERTVERIFY request, except in McAfee Web Gateway (AFAIK).  It’s a pseudo command that McAfee invented (AFAIK) for SSL inspection.  It allows a second request cycle to be generated after receiving the server hello from the destination server—so that McAfee can go back through the rules to get instructions on how to validate the server certificate, setup the SSL connection with the client, and start SSL inspection.  But, this second request cycle doesn’t get triggered for a POST—even though a server hello is received.  In fact, McAfee Web Gateway responds to that server hello, and the connection to the destination completes without any problems.

       

      Yes, the script we are trying to make work succeeds when I enable certificate verification for the destination configuration.  (McAfee’s terminology is a bit weird).  Yet, only the destination side of the proxy is encrypted with TLS.  The client side is still HTTP (no ‘S’).

       


       

      Now the script has proxy settings; so whoever wrote it must have tested it through some proxy (we’d hope).  Does that proxy have features that MWG does not?  Should a feature request be put into the author to have it do a CONNECT before the POST? 

       

      Whatever library the script was written with for HTTPS (User-Agent: libwww-perl/6.05) must have the ability to do a CONNECT (again, one would hope).  Ultimately, doing a CONNECT first shares less information in clear text, which makes things more secure.  Whether that’s relevant or not is a matter of whether the headers in those scripts might have some level of sensitivity.  And, since they are currently not doing a CONNECT first, they should have ensured that the content was not sensitive (one would hope).  (And, I notice that the URL has no path.  Also, note that CONNECT requests often have no HTTP headers, not even a User-Agent, which has made some things difficult.)

       

      The test we did in the test environment is a hack.  We are clearly doing things in a way that was not intended. As a workaround, it adds a security risk--in that the client side is a clear channel (and quite sniff-able).  To consider it as a temporary solution, someone first needs to accept the security risk and the consequences thereof.  But now that we’ve made it work in the test environment, I can see that I can do it in a single rule.  So, it’s a feasible workaround—if someone wants to sign off on the risk.