arrays of strings in C/C++
September 27, 2008 4:56 AM   Subscribe

I am currently a second semester programming student and I missed class because of jail; I was trespassing. my problem is, in my structured programming class we are using C/C++ and I need to know how to make an array of strings: multiple strings stored in an array. can someone help or guide me in the right direction.
posted by phllip.phillip to Computers & Internet (21 answers total) 6 users marked this as a favorite
 
Is this c or c++?

In c++, use the string class, then make an array of them.

In c (someone will be along soon to correct if I'm wrong) you need to treat a string as an array of chars, so what you're actually looking for is a 2d array of chars.

Do you know how big the arrays have to be, or does it have to be decided at runtime?
posted by handee at 5:07 AM on September 27, 2008


C: Iliffe vector
posted by Loto at 5:12 AM on September 27, 2008


Response by poster: The length of the arrays is nine digits. Yet I would also appreciate knowing how to make arras where length is decided at runtime.
posted by phllip.phillip at 5:13 AM on September 27, 2008


mpl2: That is completely wrong.

phillip.phillip: I'll expand on the Iliffe vector. It's just a way to handle multidimensional arrays in C by creating an array of pointers to pointer. So, say we wanted to store four five character words, you would implement it like this (excuse my quick and dirty code):

char * words[4]
for (i=0; i <> words[i] = malloc ( 5 );
}

You now have an array of five character words.
posted by Loto at 5:37 AM on September 27, 2008


Ah shit, MeFi ate my code.

Here is another link: C-Faq
posted by Loto at 5:40 AM on September 27, 2008


Response by poster:
#include
#include //string class
#include
#include
#define getPairSum(d1,d2) (d1- '0')*10 + d2 - '0'
#define REPORTHEADA " Input 10Digit Numeric Validated\n"
#define REPORTHEADB " Code Sum CodeValue Code\n"
#define REPORTFORMAT " %-12s%5d%12d%15s%c\n"

using namespace std;

int main(void)
{
??? creditCode[11]; //What do I put here to make this an array of strings
int sumOfPairs;
int sumMOD26;
char validationChar;

posted by phllip.phillip at 5:41 AM on September 27, 2008


Here's some C++ that demonstrates creating, modifying, and reading from an array of strings:

using namespace std;
#include <iostream>
#include <string>

int main ()
{
string people[3] = {"God","Mary","Jesus Christ"};
string places[3] = {"dick","dick","dick"};

for (int i=0 ; i<3 ; i++) {
std::cout << "I punched " << people[i] << " in the " << places[i] << "\n";
}

people[0] = "Cheney";
people[1] = "Powell";
places[1] = "colon";
people[2] = "George";
places[2] = "bush";

for (int i=0 ; i<3 ; i++) {
std::cout << "I punched " << people[i] << " in the " << places[i] << "\n";
}

return 0;
}

posted by Mike1024 at 6:14 AM on September 27, 2008 [5 favorites]


In c++, 'string' is an inbuilt type.

//array of ints, size 9
int arrayofints[9];

//array of strings, size 9
string arrayofstrings[9];
posted by jacalata at 6:18 AM on September 27, 2008


Note that the C (& C++) entrypoint is
int main(int argc, char **argv){

and that argv is, in fact, an array of char*, so you've been using them all the time.

As for static initialization in C-style,

char *aos[]= {"First","Second","Third",0};

works, but gets messy if aos is in a struct or class.
posted by hexatron at 6:38 AM on September 27, 2008


A dynamically allocated array sounds like a vector...

#include <vector>
#include <string>

using std::string;
using std::vector;

int main(void)
{
vector<string> v;
string s;

while(true)
{
cout <> cin >> s;
v.push_back(s);
}
return 0;
}
posted by BigVACub at 6:48 AM on September 27, 2008


Best answer: OK, let's break this down.

1. We're not here to do your homework.
2. The answer is in your assigned text; read it.
3. Your code looks like crap, for a variety of reasons (see below).
4. Because I'm feeling very generous today, I will give you an answer like that you would receive in comp.lang.c. or comp.lang.c++, which means acerbic and pedantic but correct. Answer follows:

C and C++ are different languages with a similar syntax. While a correct answer for the C language can also be applied in the C++ language, that would contravene the whole philosophy behind C++. Therefore, we will give separate answers for each language. Since some clues in your code suggest you are using C++, and because the C++ answer is simpler, we will give that first.

C++ answer: in modern C++, a programmer should, as much as feasible, leave memory management (which is, to invoke a favorite phrase of Stroustrup's, "tedious and error-prone") to the Standard Library. So the C++ answer to "an array" is "use std::vector" and the C++ answer to "a string" is std::string. Both of these are Standard, well-known, mostly efficient, and well-tested. Unless you have a good, demonstrable, quantifiable reason to not use them, you should use them. (In particular, your belly-rumblings about "efficiency" are not good reasons.)

Since vector is a parameterized type (or "templated type"), we need to tell the compiler the type to template on. Thus, the declaration of a vector of strings is:

std::vector< std::string > someNameOfTheVector ;

This creates a vector that can hold strings, but that is currenlty empty of any strings. To create a vector that holds 10 strings, we can write this:

std::vector< std::string > someNameOfTheVector(10) ;

A vector of string created this way will create 10 strings, using the no-argument string ctor, which will result in a vector of 10 empty, zero-length strings. If we wanted to create the strings (so the vector is not empty),but didn't like having empty strings, we could pass in a string that would be copied to make each string in the vector:

std::vector< std::string > someNameOfTheVector(10, "copy this string") ;

A few notes about std::string. std::string is actually a typedef, an alias for a templated type. You didn't know this before, and it doesn't matter to you now, as you can use if just as if it were a non-templated class. std::string manages its own memory, so you can assign bigger strings to smaller ones, or whatever, without everhavingtowory about buffers or memory management or pointers or null terminating.

That is to say, it's far easier to use than a C-style string. But since C-style strings are so ubiquitous, the string clas s has a special constructor that creates a std::string from a C-style string, and the language adds a special relaxation of the rules about const correctness to make this even easier.

Ok, so that's the answer to your question, if you are indeed writing in C++. Now on to your code. if it's C++, it's crap. in C++, while you can use preprocessor #defines, you shiuldn't, except in a few special cases. This is because #defines are lexically substituted by the pre-processor, and that loses you the type-safety that is a primary reason for using C++ in the first place.

Your first #define, #define getPairSum(d1,d2) (d1- '0')*10 + d2 - '0', should be replaced with a function, and the arguments specified as char or int as appropriate. (I assume the arguments are the characters representing decimal digits.)

The other #defines should be unmodifiable C-strings, or std::strings. Since we have the std::string converting constructor I mentioned above, the usual idiom is to use an unmodifiable C-string. This, and for compatibility with libraries that use C-style strings, is pretty much the only place in C++ where you should be using C-style strings. In C++ "unmodifiable" means "const". const char * means a pointer to a char that can't be modified through that pointer (or, legally, through any other pointer). Adding another const after the pointer meansthat the pointer can't be re-pointed somewhere else. By adding an equal sign and a constant string expression in double quotes, we have a C-style string that can't be modified and can be used anywhere a std::string is required:

const char * const MANIFEST_CONSTANT_STRING = "Read only string" ;

Having gotten rid of the #defines, you'll also need to change the signature of main. While you avoided the great solecism of having main return void (good for you!), you committed the C-ism of using "void" to mean "a function taking no arguments. While that's correct in C, it's not in C++: in C++, to declare a function taking no arguments, put nothing (except possibly whitespce) between the parentheses.

Finally, while the using directive "using namespace std" is legal, it's very poor form, as it throws away any advantage of having namespaces. Either always explicitly use the namespace prefix, or at least only use what you know you're using, by using "using std::vector ;" and "using std::string".

Ok?

Now, if your question was how to make an array of strings in C, well, that's a good deal more complicated, and I'll defer answering that until and unless you confirm that's your real question.
posted by orthogonality at 7:21 AM on September 27, 2008 [12 favorites]


#include <string>
#include <vector>
#include <exception>

using namespace std;

int main (int argc, char** argv)
{
 try
 {
  vector<string> myStringVector(9);
  string myString;
  int index = 0;

  while (cin >> myString)
  {
   myStringVector.at(index++) = myString;
  }
 }
 catch (bad_alloc exc)
 {
  cout << "exception: " << exc.what() << endl;
 }
}
posted by Blazecock Pileon at 7:24 AM on September 27, 2008


Sorry, that should be "while ((cin >> myString) && (index < 9))" or you'll get a bounds exception. In any case, if you're using STL, I'd advice using try..catch blocks and explicitly declaring your vector's space needs up front.
posted by Blazecock Pileon at 7:27 AM on September 27, 2008


Blazecock Pileon writes "Sorry, that should be 'while ((cin >> myString) && (index < 9))' or you'll get a bounds exception."

std::vector::push_back is your friend, and lets you forget about bounds. Also, while std::vector::at will check bounds and throw, using operator [] and a good test lets you not incur the cost of run-time bounds checking.

If we need to ensure we read at most nine strings, we should use a for loop:
for( int i = 0; i <>
Also, you're not catching the out of bounds, only the bad_alloc, which you should be catching by reference (catch std::bad_alloc& e)), not by value. Catching by value (like anything by value) requires a copy, which means std::bad_alloc( const std::bad_alloc&) has to be called; especially when catching an exception, we want to minimize anything that might do copying and possibly memory allocation.
posted by orthogonality at 7:52 AM on September 27, 2008


Argh.

If we need to ensure we read at most nine strings, we should use a for loop:
for( int i = 0; i < 9 && cin; ++i ); , testing the value of i and cin's op bool().
posted by orthogonality at 7:53 AM on September 27, 2008


std::vector::push_back is your friend, and lets you forget about bounds

It does, and for this small example (i.e. nine strings) capacity doesn't really matter, but depending on the STL implementation, push_back() on a near-full vector can cause the vector to double its capacity(), while only adding one of the container object to the size().

For nine strings, not really an issue, but for many thousands to millions of objects, that change in capacity could be a problem. By allocating the space up front in a try..catch block, you are practically assured you will have that space available.
posted by Blazecock Pileon at 8:17 AM on September 27, 2008


Blazecock Pileon writes "For nine strings, not really an issue, but for many thousands to millions of objects, that change in capacity could be a problem. By allocating the space up front in a try..catch block, you are practically assured you will have that space available."

push_back is (amortized) constant time, but your argument for failing fast if (a known) capacity can't be met is compelling. Point taken.
posted by orthogonality at 8:33 AM on September 27, 2008


Dude, go to your prof during office hours.
posted by troy at 10:45 AM on September 27, 2008 [7 favorites]


In C, as opposed to C++, a string is just an array of char with a null (zero byte) in the last place. That's the format expected by all the standard functions that deal with strings.

Now, you could declare a string value the same way you would declare any other array:
char example[] = {'H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', 0};
But since this syntax would drive everybody nuts even faster than the rest of C does, the following shorthand version does exactly the same thing:

char example[] = "Hello, world";

One gotcha that will trip up the unwary is that although this is a 12-character string, it's an array of 13 chars. C strings always need an extra character allocated for the null that terminates them.

The simplest C array of strings, then, is just an array of arrays of char. Say you needed an array of nine strings, and each string could be up to 15 characters long. You could declare that with something like
    char sample_array[9][16] = {        "Zero",        "One",        "Two",        "Three",        "Four",        "Five",        "Six",        "Seven",        "Eight"    };
Because the compiler is capable of counting the elements of an initializer, it can work out that there are nine strings there without you being so explicit about it; you could use sample_array[][16] instead of sample_array[9][16] and end up with the same data structure.

That declaration would create an in-memory structure arranged like this:
Z  e  r  o  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0O  n  e  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0T  w  o  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0T  h  r  e  e  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0F  o  u  r  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0F  i  v  e  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0S  i  x  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0S  e  v  e  n  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0E  i  g  h  t  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
To output the string at array index 2, you could use puts(sample_array[2])and what would come out on the console is
Two
(puts() adds the newline). To get the value of the second character of the fourth string, you'd use sample_array[3][1].

But wait a minute. If you look at the prototype for puts() in stdio.h, you will see that it's expecting its parameter to be a const char * i.e. pointer to char, not a string. Why does it work?

It works because (a) there is no such base type as "string" in C, merely a bunch of conventions for handling char arrays with zero terminators and (b) C automatically converts an expression of type "n-element array of X" into a value of type "pointer to X" when it needs to pass such a value to a function or do arithmetic on it.

To have any hope of not getting lost when thinking about arrays of strings in C, you need to know about that automatic conversion, and you also need to understand how the array indexing operator [] works. By definition,
expression[index]
is exactly equivalent to
*(expression + (index))
Given that, let's look at the sample_array[2] expression that got passed to puts() above, and work out what it actually means.
sample_array[2]
is the same as
*(sample_array + 2)
Now, sample_array is an array expression; we're about to do arithmetic on it (we're adding 2) so we need the automatic conversion to turn it into a value. The type of sample_array is "9-element array of 16-element array of char", so the result of automatic conversion will be of type "pointer to 16-element array of char" and what it will point to is the first element of sample_array: the region of memory containing
Z  e  r  o  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
The result of adding 2 to that pointer's value is to bump it along by the size of two of the things it points to, resulting in a pointer to the third element of sample_array: the region of memory containing
T  w  o  \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
The result of dereferencing that (which is what the prefix * operator does) is an array expression identifying (rather than pointing to) the third element of sample_array. That expression is of type "16-element array of char", and it in turn undergoes automatic conversion before being passed to puts(), which ends up seeing a pointer to char, just as its prototype said it should.

Re-read everything from "To have any hope..." down to here until it makes sense before going any further.

OK. The array of strings presented above works well enough, but it wastes a lot of space. Each array element has to be at least as long as the longest string in the entire array (plus one byte for the zero terminator). As a result, it isn't often used. What you'll typically see instead is the Iliffe Vector construct that Loto talked about above. For static data, these are even easier to set up than the straight 2D array:
    char *sample_array[] = {        "Zero",        "One",        "Two",        "Three",        "Four",        "Five",        "Six",        "Seven",        "Eight"    };
That declaration would create an in-memory structure arranged like this:
(pointer)(pointer)(pointer)(pointer)(pointer)(pointer)(pointer)(pointer)(pointer)Z  e  r  o  \0O  n  e  \0T  w  o  \0T  h  r  e  e  \0F  o  u  r  \0F  i  v  e  \0S  i  x  \0S  e  v  e  n  \0E  i  g  h  t  \0
All nine of those pointers are pointers to char. The first one points to the 'Z' in "Zero"; the second one to the 'O' in "One" and so on. That group of nine pointers is sample_array - an array of pointers to char - and the text itself isn't actually named. In fact, the text will probably end up in a memory region quite distant from the pointer array. Note that the text is no longer padded with extra zeroes.

The interesting thing about this structure is that references to it look exactly like references to the straight 2D array. You can still do puts(sample_array[2]), and you can still get the value of the second character of the fourth string with sample_array[3][1].

Why? Let's look at sample_array[2] being passed to puts() again.
sample_array[2]
is the same as
*(sample_array + 2)
Now, sample_array is an array expression; we're about to do arithmetic on it (we're adding 2) so we need the automatic conversion to turn it into a value. The type of sample_array is "9-element array of pointer to char", so the result of automatic conversion will be of type "pointer to pointer to char" and what it will point to is the first element of sample_array. The result of adding 2 is to bump it along by the size of two pointers-to-char, resulting in a pointer to the third element of sample_array. The result of dereferencing that pointer is the value of the third element, which is itself a pointer: to the 'T' in "Two". Because that's a pointer value, not an array value, it needs no further conversion before being passed to puts().

But what about sample_array[3][1]? Well, sample_array[3][1] means the same thing as *(sample_array[3] + 1). Since sample_array[3] contains a pointer to the 'T' in "Three", adding 1 to it will yield a pointer to the next char along (the 'h' in "Three") and the dereference will yield that 'h' itself.

In both these cases, the explicit pointer extracted from the array becomes a drop-in replacement for the implicit value that C would generate via array-to-pointer conversion in the 2D array case. Read that sentence again - it's important.

That's about all there is to pre-initialized arrays of strings in C. The hairy part comes in when you want to be generating arrays of strings at runtime (reading them in, building them from bits of other strings and so on); because there's no real support for strings in the language itself, you will generally end up spending a fair bit of time fartarsing about with memory allocation. If you're used to a language like Java that has automatic garbage collection, the fiddliness of this will drive you nuts.

To figure out the best approach, you need to think about how your strings are getting made, how long they're going to stick around, what limits there are on their lengths and so on.

If all you need to deal with is a small number of shortish strings, the best way is probably to avoid dynamic memory allocation altogether and use a static 2D array of char. You need to make the row length big enough for your biggest string, and be rigorous about checking lengths to avoid string overflows.

If you're using a small number of strings of arbitrary length, create a char *array[n] as in the second part of this answer, but without the braced initializer; allocate space for your strings in the heap using malloc(), and save the resulting pointers into the array. You need to take care with malloc() and free() to avoid invalid memory references and memory leaks, as well as doing all the bounds checking you'd need for the simpler method.

For anything bigger or more complicated, use a decent string library like bstring, and make your arrays out of the string types provided by that library. Code up a few little string-related projects without the library first, though, so you get a feel for the properties of the underlying structures and can appreciate how much work the library is actually saving you.
posted by flabdablet at 7:05 AM on September 28, 2008 [2 favorites]


flabdablet writes "Z e r o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0"

My copy of the ANSI Standard's on another computer; does the C Standard require that, for a statically initialized array of char[][] to which constant strings are assigned in the declaration, that elements beyond the null-terminator be zero initialized?
posted by orthogonality at 11:30 AM on September 28, 2008


Should be safe.

6.7.8.21:
If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
6.7.8.10:
If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
  • if it has pointer type, it is initialized to a null pointer;
  • if it has arithmetic type, it is initialized to (positive or unsigned) zero;
  • if it is an aggregate, every member is initialized (recursively) according to these rules;
  • if it is a union, the first named member is initialized (recursively) according to these rules.
This makes sense from an implementation point of view, as well. It allows the compiler to initialize automatic data structures using a simple memcpy() from an unnamed static initializer of the same type, which was itself built using the same logic used for initialized named statics.
posted by flabdablet at 7:07 PM on September 28, 2008 [1 favorite]


« Older Shod   |   How to cure a retrobulbar abscess? Newer »
This thread is closed to new comments.