Perl pack Function
The Perl pack function has as arguments a LIST of values and a TEMPLATE. It concatenates into a string the list values converted according to the formats specified by the template. It returns the resulting string. Its mainly purpose is to turn data (numbers and strings) into a sequence of bits that can be easily used by some external applications.
You can use the Perl pack function either to achieve binary data to a file or for network transmission.
The reverse of this function is the unpack function which takes a sequence of bits and converts it into numbers and strings, needed for further processing.
The syntax form of the Perl pack function is as follows:
STRING = pack TEMPLATE, LIST
The TEMPLATE consists of a sequence of characters as shown in the table below. One or more modifiers may follow some letters in the template (for instance, each letter may optionally be followed by a number giving a repeat count; or a * for the repeat count means to use however many items are left). In this short tutorial I intend to review some of the most frequent template sequences used by the Perl pack function and exemplify them with a few appropriate examples.
The following table shows you some of the most frequent template characters:
| a | A string with arbitrary binary data, will be null padded |
| A | A text (ASCII) string, will be space padded |
| b | A bit string (ascending bit order inside each byte, like vec()) |
| B | A bit string (descending bit order inside each byte) |
| c | A signed char (8-bit) value |
| C | An unsigned char (octet) value |
| d | A double-precision float in the native format |
| f | A single-precision float in the native format |
| h | A hex string (low nybble first) |
| H | A hex string (high nybble first) |
| i | A signed integer value |
| I | A unsigned integer value |
| l | A signed long (32-bit) value |
| L | An unsigned long value |
| n | An unsigned short (16-bit) in "network" (big-endian) order |
| N | An unsigned long (32-bit) in "network" (big-endian) order |
| s | A signed short (16-bit) value |
| S | An unsigned short value |
| U | A Unicode character number |
| v | An unsigned short (16-bit) in "VAX" (little-endian) order |
| V | An unsigned long (32-bit) in "VAX" (little-endian) order |
| x | A null byte |
| X | Back up a byte |
| a | A string with arbitrary binary data, will be null padded |
The following example shows you how to deal with the Perl pack function and the 'a' template:
#!/usr/local/bin/perl
use strict;
use warnings;
my $str = pack 'a7', '123a'; # "123a\0\0\0"
# split the string into an array of characters
my @array = split //,$str;
# converts the elements of the array into their
# equivalent hex codes
@array = map( sprintf("%x", ord), @array);
# print the array with spaces between elements
print "@array\n";
# it prints: 31 32 33 61 0 0 0
The code begins with the calling of the Perl
pack function. The
$str is the string where the result will be returned, '
a7' is the template and '
123a' is the string to be converted. The
7 digit in the template is a modifier and it means that it will be appended so many null bytes until the resulting string will have 7 characters length.
The following lines of the code allow you to see the content of the $str converted in hexadecimal characters.
If you have a list of strings to be converted, you can use the 'x' (repetition) operator like in the following line of code:
my $str = pack 'a' x 7, '12', '34', '56'; # "135\0\0\0\0"
You can use the Perl
pack function with the '
a' template to convert a string into an ASCII string followed by a null, that can be used in a C program:
my $cStr = pack ('ax', $perlStr);
Here the
'x' character will append a null character as the rightmost character of the string.
| A | A text (ASCII) string, will be space padded |
This template is similar with the previous one, except that space is used instead of null. See the above example for this. For instance, in one of the previous examples you can use the line:
my $str = pack 'A7', '123a'; # "123a "
instead of:
my $str = pack 'a7', '123a'; # "123a\0\0\0"
You’ll get as output:
31 32 33 61 20 20 20 where
20 is the hex code for the space character.
| b | A bit string (ascending bit order inside each byte, like vec()) |
The 'b' format of the Perl pack function packs strings consisting of 0 and 1 characters to bytes. A byte consists of a group of 8 bits as in the following figure:
LSB means here the least significant bit and it is sometimes referred as the rightmost bit. MSB is the most significant bit and is sometimes referred as the leftmost bit. In the above example, MSB = 1 and LSB = 0.
The 'b' format means that the bits are specified in increasing order from MSB to LSB. For instance, in the next line of code:
my $nr = ord pack ('b8', '10110010');
the
$nr variable will be assigned with 77 = 1 + 4 + 8 + 64
In this representation, the count refers to the number of bits to be packed - in the above example the count is 8.
You can use the Perl pack function with the 'b*' format to translate a string of 0’s and 1’s into a bit string, and the unpack function to get back the list of 0’s and 1’s from the bit string. Here’s an example:
#!/usr/local/bin/perl
use strict;
use warnings;
my @bitArray = qw(1 0 0 0 1 1 1 1 0 0 1 1);
my $bitString = pack 'b*', join('', @bitArray);
@bitArray = split(//, unpack('b*', $bitString));
print "@bitArray\n";
# it prints: 1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0
Please note that our initial array of bits had 12 elements only, so the Perl
pack function initialized the last 4 bits of the
$bitString with 0.
| B | A bit string (descending bit order inside each byte) |
The 'B' template is similar with the 'b' template except that the bits are specified in decreasing order from LSB to MSB. For instance, in the next line of code:
my $nr = ord pack ('B8', '10110010');
the
$nr variable will be assigned with 178 = 2 + 16 + 32 + 128
You can use the Perl pack function with the 'B*' format in a similar way as shown in a previous example for the 'b' format:
#!/usr/local/bin/perl
use strict;
use warnings;
my @bitArray = qw(1 0 0 0 1 1 1 1 0 0 1 1);
my $bitString = pack 'B*', join('', @bitArray);
@bitArray = split(//, unpack('B*', $bitString));
print "@bitArray\n";
# it prints: 1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0
| c | A signed char (8-bit) value |
The 'c' template format is for signed char values. The usage is similar with the 'C' template format – see it above.
| C | An unsigned char (octet) value |
The 'C' format is used for unsigned characters. Here're a few examples:
#!/usr/local/bin/perl
use strict;
use warnings;
my $str = pack 'CCCC', 97, 98, 99, 100, 101, 102;
# 97 is the numeric value of the ASCII 'a' character
print "$str\n"; # abcd
# 3 is a count for the number of characters packed
$str = pack 'C3', 97, 98, 99, 100, 101, 102;
print "$str\n"; # abc
# x is the repetition operator
$str = pack 'C' x 5, 97, 98, 99, 100, 101, 102;
print "$str\n"; # abcde
# the '*' is like a wildcard for more of the same.
$str = pack 'C*', 97, 98, 99, 100, 101, 102;
print "$str\n"; # abcdef
The following example shows you how to use the Perl
pack function with the '
C*' template in conjunction with other Perl functions. The '
*' is like a wildcard for more of the same.
#!/usr/local/bin/perl
use strict;
use warnings;
my $str = pack('C*',map ord,split(//,'This is Perl'));
print "$str\n";
# it prints: This is Perl
The
split function will create an array from the string 'This is Perl', each character becomes an element of the array. The
map function will run the
ord function for each element of the array and it will return a list with the ASCII values of the characters. Finally, the Perl
pack function with the '
C*' template (for unsigned characters) is used for all the numbers of the list (if you put an
* character inside the template, you don’t need to count the elements of the list argument).
| d | A double-precision float in the native format |
The 'd' format of the Perl pack function is for 64 bit floating point in native machine format. Its usage is similar with the 'f' template format (see below).
| f | A single-precision float in the native format |
The 'f' format of the Perl pack function is for 32 bit floating point in a native machine format. Because of the variety of floating formats around, it’s possible that floating point data written on one machine may not be readable on another – as in the case that the two machines have different endianness. You can use this format like in the following line of code:
my $float = pack 'f', 23.13421;
where
$float will contain the number in a native float format. To extract the number from this string, you need to use the
unpack function:
my $nr = unpack 'f', $float;
Or you can use the Perl
pack function by following the '
f' specifier with a count, if you know how many floats you want to pack:
my $floats = pack 'f2', 3.14, 2.287;
If you have more single-precision float numbers to pack, you can use the '
*' repeat pack-format that will pack all the available float numbers from the list:
#!/usr/local/bin/perl
use strict;
use warnings;
my @floatArray = (23.13421, 112.78, 77.896);
@floatArray = unpack ('f*', pack('f*', @floatArray));
print "@floatArray\n";
# it displays: 23.1342105865479 112.779998779297 77.8960037231445
Here the Perl
pack function will return a string with 3 single-precision float numbers packed into the specific native machine format. The
unpack function will unpack the 3 numbers from the pack resulting string into an array.
Finally, the array with the result will be printed. As you can notice, the content is equal with the content of the initial array – there are even a few more decimal digits for each unpacked number.
| h | A hex string (low nybble first) |
The 'h' template format is for packing a hex string by putting the low nibble first. Its usage is similar with the 'H' template format – see above.
| H | A hex string (high nybble first) |
The 'H' template format of the Perl pack function is for packing a hex string by putting the high nibble first. If you want to get back the unaltered value of the string, you can use the unpack function but with the same template format. If you use unpack with 'h' format, you’ll get the bytes in the same order but with their nibbles reversed, as you can notice in the next snippet:
my $str = pack'H*','6162636465';
print unpack ('H*', $str), "\n"; # it prints: 6162636465
print unpack ('h*', $str), "\n"; # it prints: 1626364656
Here I put a
* character inside the template, to avoid counting the hex characters of the string argument.
This template format of the Perl pack function generates a signed integer and you can use it like this:
my $integer = pack 'i', 150;
The number
150 will be converted into the format used to store integers on your machine and the result will be stored into the
$integer variable. If you have many integers to pack, you can use the '
*' repeat pack-format that will pack all the integers available in the list:
#!/usr/local/bin/perl
use strict;
use warnings;
my @integerArray = (150, 160, 170, 180, 190);
@integerArray = unpack ('i*', pack('i*', @integerArray));
print "@integerArray\n";
# it displays: 150 160 170 180 190
Here the Perl
pack function will return a string with 5 integers packed into the specific integer format to your machine. The
unpack function will unpack the 5 integers from the pack resulting string into an array. Finally, the array with the result will be printed. As you can notice, the content is equal with the content of the initial array.
But the 'i' format is machine dependent, so if you pack a list of integers into a string and then unpack it to another machine, it’s possible to get back a list of weird things.
| I | An unsigned integer value |
If you need to pack unsigned characters, you can use the 'I' template format of the Perl pack function. See above the 'i' format examples, the usage is similar.
| l | A signed long (32-bit) value |
The 'l' format generates a signed long format, which generally generates a four-byte number. It depends if the machine is little- or big-endian. See the following lines of code for a short example:
my $str = pack('l', 0x61626364);
print "$str\n";
This code creates a four-byte consisting of either dcba if the machine is little-endian or abcd if the machine is big-endian. Here
61,62,63,64 are the ASCII values for the a,b,c,d characters.
The 'L' format of the Perl pack function generates an unsigned long value, its usage is similar with the signed long format. Its length is exactly 32 bits and could differ from the long format of the local C compiler.
| n | An unsigned short (16-bit) in "network" (big-endian) order |
The 'n' format tells to the Perl pack function to create an unsigned short in a network byte order. This format is specific to TCP/IP communications and you need to use this format (or 'N' for bigger numbers) if you do certain types of TCP/IP communication. You can use it like in the following line of code:
my $nr = pack 'n', 1234, 235;
Because we didn’t provide any qualifier inside the template, the Perl
pack function will pack just the first number and it will return it in the
$nr variable. The second number (
235) from the list will be lost.
| N | An unsigned long (32-bit) in "network" (big-endian) order |
The 'N' format tells to the Perl pack function to create an unsigned long in a network byte order. You can use it similar with the 'n' template format. Here’s a short example:
my $nrs = pack 'N*', 45320..45325;
my @array = unpack 'N*', $nrs;
print "@array\n";
# it displays: 45320 45321 45322 45323 45324 45325
If you use the '
*' repeat pack-format, you don’t need to provide the count of the numbers you intend to pack.The
unpack function was used to extract the numbers from the packed
$nrs string and populate an array with them.
| s | A signed short (16-bit) value |
This format is for signed short numbers. If you transfer data across the network or onto a disk of another computer, you must consider the endianness of your computers, because the integers and the floating-point numbers could be stored in memory in different orders. So you must take this into considerations when you use the 's' format. A short example about how to use it:
my $i16 = pack 's*', 21, 77, 100, 256;
In this example the '
s' format is associated with '
*' that allows you to use the Perl
pack function to pack as many short integers as you have in your list. You can determine the endianess of your system by using this format, as you can see in the example below:
#!/usr/local/bin/perl
use strict;
use warnings;
my $v = unpack("h*", pack("s", 1));
if($v =~ /^1/) {
print "Little endian system\n";
} elsif ($v =~ /01/) {
print "Big endian system\n";
} else {
print "Unknown endian format\n";
}
print "$v\n";
# on my Windows system it displays: 1000
On my local Windows computer, after running this code I received the message: 'Little endian system'. The
unpack function was used to unpack the packed number in a hex format.
| S | An unsigned short value |
The 'S' format is for unsigned short integers, its usage is similar with the 's' format – see above.
| U | A Unicode character number |
The 'U' template format of the Perl pack function allows you to pack a Unicode number into its UTF-8 representation. The Unicode character sets associate characters with integers and the converting of the Unicode characters to UTF-8 format let you store only the bytes that are needed. The most common cases are that when the Unicode characters are encoded in one or two bytes only. For instance, the next example converts into UTF-8 the smile face Unicode character:
my $utfSmiley = pack 'U', 0x263A;
print "length of \$utfSmiley = ", length($utfSmiley),
", length of 0x263A = ", length(0x263A), "\n";
# it displays: length of $utfSmiley = 1, length of 0x263A = 4
You can notice the difference of the two item lengths in the memory. To get back the information in a Unicode format, you can use the unpack function.
Because of the endianness of a system, the integers and floating-point numbers are stored in a different order, so if you move binary data across the network, you could expect to meet some format issues. A way to avoid this is by using 'U', the Unicode character number. You can use the Perl pack function to pack a sequence of characters encoded as characters in UTF-8 format on a computer and use the unpack function on another. See the following example where we use the Perl pack function to pack a few integers into an UTF-8 format:
my @integers = (1234, 23, 456, 789);
my $utfIntegers = pack 'U*', @integers;
@integers = unpack 'U*', $utfIntegers;
print "@integers\n";
# it displays: 1234 23 456 789
You can use the 'U' format to encode the Unicode characters of an alphabet. For instance, the Unicode Hebrew alphabet ranges from 0x0590 to 0x05ff. The following example shows you how to pack and unpack the Hebrew Unicode alphabet:
my $utfHebr = pack 'U*', 0x0590..0x05ff;
my @UniHebr = unpack 'U*', $utfHebr;
| v | An unsigned short (16-bit) in "VAX" (little-endian) order |
The 'v' format is for 16-bit unsigned short numbers being similar with the 'n' format but refers to a little-endian order. When you need to pack some unsigned short numbers in a little endian format, you should use this format. The next line of code shows you how to use the Perl pack function to pack it:
To get back the number, you can use the
unpack function.
| V | An unsigned long (32-bit) in "VAX" (little-endian) order |
The 'V' template format is for unsigned long (32 bit) numbers, its usage is similar with the previous format.
You can use the Perl pack function with the 'x' format if you want to pack a null byte. The following example puts a null between the a, b, c characters. The result is stored in the $str variable.
my $str = pack 'CxCxC', 97..99;
print "$str\n"; # a\0b\0c\0
# it displays: a b c
The 'X' format of the Perl pack function is used to move one byte backwards in the string. Here’s an example:
my $binaryString = pack ('C4X2', 97..105);
print unpack ('C*',$binaryString), "\n";
# it displays: 9798
In this code
97..105 are the decimal values of the a-i ASCII characters; the characters
99,100 were removed and the characters
101-105 were not packed at all because there isn’t any specifier for them inside the template. The use of the
unpack function tell you that only the first two characters were packed.
Dictionary- The big-endian and little-endian are derived from "Big End In" and "Little End In" and refer to the way in which memory is stored. For instance a word like 0x1234 is stored in memory as (0x34 0x12) if the machine is little-endian (in a reverse order) and (0x12 0x34) if the machine is big-endian. The vast majority of Windows is little-endian.
- A nibble is a single hex digit of four bits (a half byte) and there are two nibbles in a byte.
- UTF-8 is a variable-length character used for encoding Unicode; it encodes each character in 1 to 4 octets, with the single octet encoded as a 128 US-ASCII character. (from Wikipedia).
See the perldoc perlpacktut for additional information.
Please click here to download the Perl pack script with all the above examples included.
Table of Contents:
A Perl Script
Install Perl
Running Perl
Perl Data Types
Perl Variables
Perl Operators
Perl Lists
Perl Arrays
Array Size
Array Length
Perl Hashes
Perl Statements
Perl if
Perl unless
Perl switch
Perl while
Perl do-while
Perl until
Perl do-until
Perl for
Perl foreach
Built-in Perl Functions
Functions by Category
String Functions
Regular Expressions and Pattern Matching
List Functions
Array Functions
Hash Functions
Miscellaneous Functions
Functions in alphabetical order
chomp
chop
chr
crypt
defined
delete
each
exists
grep
hex
index
join
keys
lc
lcfirst
length
map
oct
ord
pack (more)
pop
push
q
qq
qw
reverse
rindex
scalar
shift
sort
splice
split
sprintf
substr
tr
uc
ucfirst
undef
unpack
unshift
values
return from Perl pack function to Perl Basics
Would you like to create your own website like this one?
Hit the Alarm Clock!