Pipes, redirection, and standard out and in

One of Unix’s most fundamental features: how programs can be joined together through the use of pipes and redirections.

"This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams because that is a universal interface." - Doug McIlroy, creator of Unix pipelines

The humble pipe character, |, is easy to overlook in long chains of Unix-style commands. But what the pipe enables – easy communication between independent programs – is essentially what made it possible (for better or for worse), for Unix to have the toolbox that it has.

Via Doug McIlroy:

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features… Expect the output of every program to become the input to another, as yet unknown, program.

Basic syntax

Before reading all the ins and outs of standard input and standard output, here's a quick review of the syntax.

The echo command, rigorously following the Unix philosophy of "do one thing, and do it well", takes the text given to it and outputs it to screen:

user@host $ echo Hey World
Hey World

However, by piping echo's output into a new program, such as tr, I can filter and transform the text.

And by redirecting echo's output, I can add to an existing file (using >>) or create a new file (using >).

How to pipe:

user@host $ echo Hey World | tr '[:upper:]' '[:lower:]'
hey world
user@host $ echo Hey World | tr '[:upper:]' '[:lower:]' | tr ' ' '9'
hey9world

As a GIF:

img

How to redirect into a file:

user@host:~$ cat newfile.txt
cat: newfile.txt: No such file or directory
user@host:~$ echo "Here is some text" > newfile.txt
user@host:~$ cat newfile.txt 
Here is some text
user@host:~$ echo "and some more text" >> newfile.txt
user@host:~$ cat newfile.txt 
Here is some text
and some more text
user@host:~$ echo 'Oops just new text now' > newfile.txt 
user@host:~$ cat newfile.txt 
Oops just new text now
user@host:~$ 

As a GIF:

img

And of course, there's nothing stopping us from piping and redirecting

echo "Pipes and arrows" | wc > words_counted.txt

An example of using standard input and output

By default, curl downloads a webpage and sends it to standard output, i.e. which, by default, is your computer monitor:

curl www.example.com

As with any other kind of standard output, this can be redirected to a file:

curl www.example.com > myexample.html

Or piped directly into another utility. Here, grep is reading from standard input, which the output of curl is piping into:

curl www.example.com | grep 'Example'

We can of course take the standard output from grep and redirect it into a new file:

curl www.example.com | grep 'Example' > grep_example.txt

An example of not using standard input or output

So what does not using standard input/output look like?

With the -o option, we can specify a filename for curl to save to.

curl www.example.com -o myexample.html

The grep program, when not reading from standard input, can take a filename as an argument; grep will open the file itself and process the data:

grep 'Example' my_example.html

These are simple examples, but I just wanted to get the syntax and functionality grounded in. The rest of this guide covers a little more jargon and syntax. But if you can just accept the ability to directly send the output of one program into the other, then you'll understand why thinking in the "Unix way" – powerful, single-purpose tools – is a very powerful and elegant system.

Standard output and input

One of the most significant consequences of pipes in Unix is that Unix programs, whenever possible, are designed to read from standard input (stdin) and print to standard output (stdout).

These jargony terms refer to streams of data, standardized as plain text. When McIlroy refers to text streams as "universal interface", he means that when programmers think in terms of text, they have to worry much less about how to get programs, or devices, to work together, making it much easier to build complex data software with tools as "basic" as cat, grep, and sort:

From the Linux Information Project:

The introduction of standardized streams represented a major breakthrough in the computer field when it was incorporated into the original UNIX operating system more than three and a half decades ago, because it eliminated the very complex and tedious requirement of having to adjust the output of each program according to the specific device or program to which it was being sent.

At this point, we ourselves may not have written anything that we consider "software" or "complex". But even if that were the case, knowing that plain text streams are the default (and often best) way for programs to talk to each other will be key in understanding how to create complex, useful software.

What is not standard input and output?

So many of the programs and utilities in Unix-land read from stdin and print to stdout that it's helpful to define stdin and stdout through examples of programs that don't use it. In other words, these programs were not meant to pipe the results of their actions straight into another program, or onto your display.

The directory-creating mkdir program is an obvious example:

user@host:~$ mkdir one_dir two_dir three_dir

When mkdir executes, it simply creates three directories. There's nothing for it to output, except error messages. So it doesn't make sense for it to have something to send along to another program. Similarly, mkdir isn't intended to read a bunch of text output from another program and create directories (though there are certainly ways to do that).

The program unzip is another such example. It'd be nice to curl down a zip file and pass it straight to unzip, which would bypass the need to save the zip file:

user@host:~$ curl http://example.com/some.zip | unzip

But that won't work. By default, unzip does send to stdout in the list of files it unzipped:

user@host:~$ unzip wh-listings.zip
Archive:  wh-listings.zip
  inflating: 0.html                  
  inflating: 1.html                  
  inflating: 10.html                 
  inflating: 100.html                

However, with the -p option, unzip will send the contents of the files into standard out.

If you've been using curl, you know that it will download a file and dump it into your screen. A similar tool, wget, will by default save the results of the download into a file:

user@host:~$ wget http://www.example.com
--2015-01-19 14:10:15--  http://www.example.com/
Resolving www.example.com (www.example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to www.example.com (www.example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Saving to: 'index.html'

100%[===================>] 1,270       --.-K/s   in 0s      

2015-01-19 14:10:15 (118 MB/s) - 'index.html' saved [1270/1270
# check out the saved file
user@host:~$ ls
index.html

Why does wget provide the convenience of saving to file by default? Because wget was designed for more than just downloading single URLs, but for spidering/mirroring entire sites. You may recall the recent data-gathering exploits of a security geek named Edward Snowden; wget was his tool of choice.

Note: wget is a great tool, but there's a reason I haven't shown many examples of using it: few of our tasks require mirroring a whole site. And if you do mirror a site, you still need to scrape it for data anyway. The use of curl better reflects our focused intentions in collecting data. If you do use wget, be aware that you may accidentally copy much, much more than you intended.

stdin: Standard input

By default, the device that stdin reads from is your keyboard.

For example, the mail program, by default, will prompt the user to fill in the Subject and Cc: options, and then take input from the user's keyboard to fill the body of an email message, including new lines, until the user presses Ctrl-D:

dun@corn30:~$ mail dun@stanford.edu
Subject: Hey there
Hey dan,

Just wanted to see how Standard Input works

Sincerely,
Dan
Cc: 

It's easier to see this interaction via animated GIF (note that I mistakenly pressed Ctrl-X instead of Ctrl-D in the recording):

img

Read from a file with left-angle bracket

Use the left-angle-bracket, <, followed by a filename, to open that file and pass its contents into stdin:

user@host:~$ wc < words.txt

Note that many utilities, such as wc (word count), have been designed to open a filename that is passed in as an argument, so that using < is optional:

user@host~$ wc words.txt

The left-angle-bracket is often seen with while loops, in which a filename is passed into the read program, and the do/done block is executed for every line in the file.

while read data_line
do
    echo "Line: $data_line"
done < file.txt

Because of the way Bash (mis)handles lines that have a space in their name, using a read-while loop, as above, is considered more reliable and preferable to this:

user@host:~$ for data_line in $(cat file.txt) 
> do
>   echo "Line: $data_line" 
> done

stdout: Standard output

By default, the device that stdout is sent to is your display monitor. For example, when you use the curl downloading tool to download a file (such as a webpage), it will, by default, send the contents of that downloaded file to stdout – i.e. your screen:

user@host:~$ curl http://www.example.com
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body 
# and so forth...

Here's a GIF of that result – as you can see, it's typically not desirable to output the contents of a large file onto your display monitor, as there is too much text to read at once:

img

Using pipes to redirect stdout to stdin

With a pipe, I can send the output of curl into a program that filters the data: head, to show just the first few lines, or tail, to show the last few lines (I use the -s option for curl to silence the progress indicator):

user@host:~$ curl -s http://www.example.com | head -n 4
<!doctype html>
<html>
<head>
    <title>Example Domain</title>
user@host:~$ curl -s http://www.example.com | tail -n 4
    <p><a href="http://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
user@host:~$ 

As a GIF:

img

Changing how mail reads

Remember the mail program which prompted me to fill in the subject, Cc, and body field for an email?

Let's see what happens when I redirect echo directly into mail:

user@host:~$ echo this is stdout | mail dun@stanford.edu

And that's it! mail doesn't even bother to ask me to fill in a subject or other optional fields. Check out the GIF recording:

img

Redirecting stdout to a file with right-angle bracket

When you want to create a file from standard output, use the right-angle bracket, >, to redirect output into a new file:

user@host:~$ grep hey file.txt > heys.txt

As we've seen before, curl has the -o option to specify a filename to redirect its output to. However, since curl obeys the practice of using stdout, we could just use the redirection operator:

user@host:~$ curl www.example.com > example.html

Warning: By default, the file at the pointy-end of the redirection operator gets destroyed if it already exists.

Append with double-right-angle bracket

The use of double-right-angle-brackets, >>, will also redirect stdout, but will either create a new file, or, if the file exists, append to it.

user@host:~$ for num in $(seq 1 100); do
> echo num >> numbers.txt
> done

Warning: A common source of misery is using > when you meant to use >>

Input file and output file on the same line

command < input-file > output-file

TLDP.org has a comprehensive list of the many ways stdout and stdin (and stderr, standard error) can be redirected. For the purposes of this course, I try to keep things pretty simple.

Useless use of cat award

Many examples on this site are feature a "useless use of cat":

user@host:~$ cat words.txt | head -n 10

This is considered a "useless" use of cat because we should use the stdin-redirection operator:

user@host:~$ head -n 10 < words.txt 

Or we could just use head and pass words.txt as an argument:

user@host:~$ head -n 10 words.txt

However, I don't mind the occasional useless use of cat when it reinforces the concept of stdout/stdin and text streams passing from one program to another, even if cat being a bit useless. Also, it can be easier to conceptualize the stream moving left to right.

Wikipedia has more examples of useless cats

Note: When cat is doing its job, i.e. operating on multiple files, then it will have a difference. Compare the output from using cat here:

user@host:~$ cat *.txt | grep 'x'

versus:

user@host:~$ grep 'x' *.txt

If you like videos, Software Carpentry has a nice tutorial on pipes and filters: