Ask questions via twitter! Message any question to @answers on twitter. We'll publish the question and send you a reply each time there's a new answer.
Next Question

Answered Question

 
 M¢25  Funded By Mahalo ? |  November 03, 2009 03:19 AM

How to import a book into a database, 140 characters at a time

So here's the deal. I have a huge plain text file (about 650K). I need to chop the book into 140 character sections, and each section needs to be a separate record in a database. Obviously, they should be sequential. Additionally, I would like to keep words whole so if a word crosses 140 characters it won't get chopped up. In the end, this data needs to be in a MySQL database, but any intermediate format is fine (CSV, Excel, Access Database or other) as I can handle moving it from there to the mysql database.

I'm looking for ideas on how to do this without getting to deep into programming. What are your thoughts?

One possiblity is if there were some command I could run from the command line (terminal, OS X) that would take the file and send 140 char chunks out to stdout, then I could just pipe that to a file... but I don't know what command to use.
Interesting Question?  Yes (0)   No (0)   
RSS
 
 

Best Answer  Chosen by Asker

 
November 04, 2009 05:50 AM
awk and/or sed are probably going to be your friends in this.

Another option is to first separate the text into words, then aggregate a string of words until you reach 140 characters. output the line and repeat.
Asker's Rating:


Helpful Answer?  (1)   (0)   

Helpful: socalsue

Tip tenasty for this answer
Permalink | Report
   Reply  
 
 
 
November 04, 2009 06:06 AM
I forgot to mention perl. With perl you should be able to the split and join functions to achieve most of this.

Report
 
 

Other Answers (2)

Sort By
 
November 03, 2009 06:33 AM
I think VI/VIM is your answer.
You looks you know some Linux - so you're very close.
Source(s):
http://vimdoc.sourceforge.net/htmldoc/usr_toc.html


Tags: vi, vim

Helpful Answer?  (1)   (0)   

Helpful: socalsue

Tip crys for this answer
Permalink | Report
   Reply  
 
 
 
November 03, 2009 02:14 PM
It sounds like you are going to do what Proustr is doing: twittering Proust. He basically outlines how he processes the text here:

http://www.omgtldr.com/proustr/

A simpler way might be to do one sentence at a time, but the problem is that some sentences are more than 140 characters. If you are interested in the sentence idea I think it would work kind of like this: In MS Word you can do the replace function and replace "period space" with "period space tab" and then export the file into Excel using a tab delimited setting. Then just save it as a CSV for import into MySQL. This would almost give you what you want, but not quite. I know it's an ugly solution, so I'm sure you will find a better one.
Source(s):
http://twitter.com/proustr
http://www.omgtldr.com/proustr/


Helpful Answer?  (1)   (0)   

Helpful: socalsue

Tip phlogiston for this answer
Permalink | Report
   Reply  
 
 

Answer this Question

How tips and payments work

This question has already been resolved. You may add an answer to it but you will not be eligible to win best answer or any associated tips.

Ask a Question


140 characters left
Top of Page
Buy Mahalo Dollars with Credit Card or PayPal

Top Members

This Week All Time
  • buddawiggi
    buddawiggi
    2nd Degree Black Belt
    26830 Points
    M$782.84 Earned
  • kty2777
    kty2777
    Purple Belt with a Brown Tip
    5313 Points
    M$198.17 Earned
  • opher
    opher
    Purple Belt
    4027 Points
    M$170.67 Earned
   See All
 

Most Popular Tags

mahalo(1581)
iphone(460)
music(458)
google(352)
food(313)
online(292)
beer(278)
money(262)
movies(254)
apple(249)
aotd(235)
health(217)
video(201)
free(201)
dog(201)
   See All
 

Categories

Welcome New Members


 
 
Mahalo Dollars are the currency of Mahalo Answers.

Each Mahalo Dollar costs $1.

Once you earn more than 40 Mahalo Dollars, you can request to be paid via PayPal. Each Mahalo Dollar is currently worth $0.75 when paid out via PayPal. Learn More

 
 

Please log in to use this function.