A proxy is a machine or software that does something on behalf of you, the client.
You can also see it as a middle man that sits between you and the server you want to work with, a middle man that you connect to instead of the actual remote server. You ask the proxy to perform your desired operation for you and then it will run off and do that and then return the data to you.
There are several different types of proxies and we shall list and discuss them further down in this section.
Some networks are setup to require a proxy in order for you to reach the Internet or perhaps that special network you are interested in. The use of proxies are introduced on your network by the people and management that run your network for policy or technical reasons.
In the networking space there are a few methods for the automatic detection of proxies and how to connect to them, but none of those methods are truly universal and curl supports none of them. Furthermore, when you communicate to the outside world through a proxy that often means that you have to put a lot of trust on the proxy as it will be able to see and modify all the non-secure network traffic you send or get through it. That trust is not easy to assume automatically.
If you check your browser's network settings, sometimes under an advanced settings tab, you can learn what proxy or proxies your browser is configured to use. Chances are big that you should use the same one or ones when you use curl.
As an example, you can find proxy settings for Firefox browser in Preferences => General => Network Settings as shown below:
Some network environments provides several different proxies that should be used in different situations, and a customizable way to handle that is supported by the browsers. This is called "proxy auto-config", or PAC.
A PAC file contains a JavaScript function that decides which proxy a given network connection (URL) should use, and even if it should not use a proxy at all. Browsers most typically read the PAC file off a URL on the local network.
Since curl has no JavaScript capabilities, curl does not support PAC files. If your browser and network use PAC files, the easiest route forward is usually to read the PAC file manually and figure out the proxy you need to specify to run curl successfully.
These are not proxies but they're blocking the way between you and the server you want to access.
A "captive portal" is one of these systems that are popular to use in hotels, airports and for other sorts of network access to a larger audience. The portal will "capture" all network traffic and redirect you to a login web page until you have either clicked OK and verified that you have read their conditions or perhaps even made sure that you have paid plenty of money for the right to use the network.
curl's traffic will of course also captured by such portals and often the best way is to use a browser to accept the conditions and "get rid of" the portal since from then on they often allow all other traffic originating from that same machine (MAC address) for a period of time.
Most often you can use curl too to submit that "ok" affirmation, if you just figure out how to submit the form and what fields to include in it. If this is something you end up doing many times, it may be worth exploring.
curl supports several different types of proxies.
The default proxy type is HTTP so if you specify a proxy host name (or IP address) without a scheme part (the part that is often written as "http://") curl goes with assuming it's an HTTP proxy.
curl also allows a number of different options to set the proxy type instead of using the scheme prefix. See the SOCKS section below.
An HTTP proxy is a proxy that the client
speaks HTTP with to get the transfer done.
curl will, by default, assume that a host
you point out with -x
or --proxy
is an HTTP proxy, and unless you also
specify a port number it will default to
port 1080 (and the reason for that
particular port number is purely
historical).
If you want to request the example.com web page using a proxy on 192.168.0.1 port 8080, a command line could look like:
curl -x 192.168.0.1:8080 http://example.com/
Recall that the proxy receives your request, forwards it to the real server, then reads the response from the server and then hands that back to the client.
If you enable verbose mode with -v
when talking to a proxy, you will see that
curl connects to the proxy instead of the
remote server, and you will see that it uses
a slightly different request line.
HTTPS was designed to allow and provide
secure and safe end-to-end privacy from the
client to the server (and back). In order to
provide that when speaking to an HTTP proxy,
the HTTP protocol has a special request that
curl uses to setup a tunnel through the
proxy that it then can encrypt and verify.
This HTTP method is known as CONNECT
.
When the proxy tunnels encrypted data through to the remote server after a CONNECT method sets it up, the proxy cannot see nor modify the traffic without breaking the encryption:
curl -x proxy.example.com:80 https://example.com/
MITM means Man-In-The-Middle. MITM-proxies are usually deployed by companies in "enterprise environments" and elsewhere, where the owners of the network have a desire to investigate even TLS encrypted traffic.
To do this, they require users to install a custom "trust root" (Certificate Authority (CA) certificate) in the client, and then the proxy terminates all TLS traffic from the client, impersonates the remote server and acts like a proxy. The proxy then sends back a generated certificate signed by the custom CA. Such proxy setups usually transparently capture all traffic from clients to TCP port 443 on a remote machine. Running curl in such a network would also get its HTTPS traffic captured.
This practice, of course, allows the middle man to decrypt and snoop on all TLS traffic.
An "HTTP proxy" means the proxy itself speaks HTTP. HTTP proxies are primarily used to proxy HTTP but it is also fairly common that they support other protocols as well. In particular, FTP is fairly commonly supported.
When talking FTP "over" an HTTP proxy, it is usually done by more or less pretending the other protocol works like HTTP and asking the proxy to "get this URL" even if the URL is not using HTTP. This distinction is important because it means that when sent over an HTTP proxy like this, curl does not really speak FTP even though given an FTP URL; thus FTP-specific features will not work:
curl -x http://proxy.example.com:80 ftp://ftp.example.com/file.txt
What you can do instead then, is to "tunnel through" the HTTP proxy!
Most HTTP proxies allow clients to "tunnel through" it to a server on the other side. That's exactly what's done every time you use HTTPS through the HTTP proxy.
You tunnel through an HTTP proxy with curl
using -p
or --proxytunnel
.
When you do HTTPS through a proxy you normally connect through to the default HTTPS remote TCP port number 443, so therefore you will find that most HTTP proxies white list and allow connections only to hosts on that port number and perhaps a few others. Most proxies will deny clients from connecting to just any random port (for reasons only the proxy administrators know).
Still, assuming that the HTTP proxy allows it, you can ask it to tunnel through to a remote server on any port number so you can do other protocols "normally" even when tunneling. You can do FTP tunneling like this:
curl -p -x http://proxy.example.com:80 ftp://ftp.example.com/file.txt
You can tell curl to use HTTP/1.0 in its
CONNECT request issued to the HTTP proxy by
using --proxy1.0 [proxy]
instead of -x
.
SOCKS is a protocol used for proxies and curl supports it. curl supports both SOCKS version 4 as well as version 5, and both versions come in two flavors.
You can select the specific SOCKS version
to use by using the correct scheme part for
the given proxy host with -x
, or you can specify it with a separate
option instead of -x
.
SOCKS4 is for the version 4 and SOCKS4a is for the version 4 without resolving the host name locally:
curl -x socks4://proxy.example.com http://www.example.com/​curl --socks4 proxy.example.com http://www.example.com/
The SOCKS4a versions:
curl -x socks4a://proxy.example.com http://www.example.com/​curl --socks4a proxy.example.com http://www.example.com/
SOCKS5 is for the version 5 and SOCKS5-hostname is for the version 5 without resolving the host name locally:
curl -x socks5://proxy.example.com http://www.example.com/​curl --socks5 proxy.example.com http://www.example.com/
The SOCKS5-hostname versions. This sends the host name to the server so there's no name resolving done locally:
curl -x socks5h://proxy.example.com http://www.example.com/​curl --socks5-hostname proxy.example.com http://www.example.com/
HTTP proxies can require authentication, so curl then needs to provide the proper credentials to the proxy to be allowed to use it, and failing to do will only make the proxy return HTTP responses using code 407.
Authentication for proxies is similar to "normal" HTTP authentication. It is separate from the server authentication to allow clients to independently use both normal host authentication as well as proxy authentication.
With curl, you set the user name and
password for the proxy authentication with
the -U user:password
or --proxy-user user:password
option:
curl -U daniel:secr3t -x myproxy:80 http://example.com
This example will default to using the
Basic authentication scheme. Some proxies
will require another authentication scheme
(and the headers that are returned when you
get a 407 response will tell you which) and
then you can ask for a specific method with --proxy-digest
, --proxy-negotiate
, --proxy-ntlm
. The above example command again, but
asking for NTLM auth with the proxy:
curl -U daniel:secr3t -x myproxy:80 http://example.com --proxy-ntlm
There's also the option that asks curl
to figure out which method the proxy wants
and supports and then go with that (with the
possible expense of extra roundtrips) using --proxy-anyauth
. Asking curl to use any method the proxy
wants is then like this:
curl -U daniel:secr3t -x myproxy:80 http://example.com --proxy-anyauth
All the previously mentioned protocols to speak with the proxy are clear text protocols, HTTP and the SOCKS versions. Using these methods could allow someone to eavesdrop on your traffic the local network where you or the proxy reside.
One solution for that is to use HTTPS to the proxy, which then establishes a secure and encrypted connection that is safe from easy surveillance.
When a HTTPS proxy is specified, the default port will be 443.
curl checks for the existence of specially named environment variables before it runs to see if a proxy is requested to get used.
You specify the proxy by setting a variable
named [scheme]_proxy
to hold the proxy host name (the same way
you would specify the host with -x
). So if you want to tell curl to use a
proxy when access a HTTP server, you set the
'http_proxy' environment variable.
Like this:
http_proxy=http://proxy.example.com:80curl -v www.example.com
While the above example shows HTTP, you can, of course, also set ftp_proxy, https_proxy, and so on. All these proxy environment variable names except http_proxy can also be specified in uppercase, like HTTPS_PROXY.
To set a single variable that controls all protocols, the ALL_PROXY exists. If a specific protocol variable one exists, such a one will take precedence.
When using environment variables to set a proxy, you could easily end up in a situation where one or a few host names should be excluded from going through the proxy. This is then done with the NO_PROXY variable. Set that to a comma- separated list of host names that should not use a proxy when being accessed. You can set NO_PROXY to be a single asterisk ('*') to match all hosts.
As an alternative to the NO_PROXY variable,
there's also a --noproxy
command line option that serves the same
purpose and works the same way.
The HTTP version of the proxy environment
variables is treated differently than the
others. It is only accepted in its lower
case version because of the CGI protocol,
which lets users run scripts in a server
when invoked by an HTTP server. When a CGI
script is invoked by a server, it
automatically creates environment variables
for the script based on the incoming headers
in the request. Those environment variables
are prefixed with uppercase HTTP_
!
An incoming request to a HTTP server using
a request header like Proxy: yada
will therefore create the environment
variable HTTP_PROXY
set to contain yada
before the CGI script is started. If that
CGI script runs curl...
Accepting the upper case version of this environment variable has been the source for many security problems in lots of software through times.
When you want to add HTTP headers meant
specifically for a proxy and not for the
remote server, the --header
option falls short.
For example, if you issue a HTTPS request
through a HTTP proxy, it will be done by
first issuing a CONNECT
to the proxy that establishes a tunnel to
the remote server and then it sends the
request to that server. That first CONNECT
is only issued to the proxy and you may want
to make sure only that receives your special
header, and send another set of custom
headers to the remote server.
Set a specific different User-Agent:
only to the proxy:
curl --proxy-header "User-Agent: magic/3000" -x proxy https://example.com/