Yahoo FOG Group Archive

Daniel Platt

Member
Joined
Feb 7, 2007
Messages
28
As a recent Festool convert, I wasn't around for the past couple of years that the forum lived on Yahoo. I've spent the past few days reading over the old forum and found the interface very clunky (SMF is so much better!). To make things a little more friendly and available off line, I downloaded all 15,430 old forum posts into an mbox formatted mail file. This works very well with my Thunderbird mail client (http://www.mozilla.com/en-US/thunderbird). There are other tools you can use as well, e.g., mboxview (http://mbox-viewer.sourceforge.net).
The whole archive is 81MB (17.5MB zipped). If anyone else is interested in a copy, just send me a PM. I can find a spot to host it.
- Daniel
 
Daniel,
Thank you very much for posting about this.

Moving to the SMF format has been a huge improvement over Yahoo, for sure!  But of course we had several very valuable discussions on Yahoo (the software was bad, not the discussions).  I've been looking for ways to move the message archives into the new forum so that they blend into the new forum seamlessly.  I'm also working on getting the old tool reviews and photo galleries moved into the new space.

So far, I've been encouraging members to go into the Yahoo forum and pick out their old discussions, then move them here as new discussions.  I always thought that was best, since the original author retains control over the content.  But that is a slow process.

Tell me more about the archive you have stored.  Can we upload it to the new forum?  Is it searchable?

Thanks again for your help,
Matthew
 
Matthew,
A 'mbox' format mailbox is just a well formatted text file. It is just a searchable as any other text file and highly dependant on the tool one uses to view the file. The archive contains all header information that was originally included in the Yahoo posting (poster, Yahoo user ID, posting date, etc.). I don't have any experience with SMF's database, but I can't see any reason that this data couldn't be uploaded into SMF. I'll poke around a bit and see what I can find.
-Daniel
 
Daniel,
I'm not an expert on databases.  Since establishing this forum, I have learned a lot, but I'm still learning more all the time.

I know there are a couple of members here with database experience.  I'd love it if one of them could jump in here and tell me if the file you created can somehow be transferred to the forum's database.

I'ld really like to find an easy way to transfer all those old discussions into the new forum.  But I would want to do it so each post is a separate entry, rather than a single file with 15,000+ messages.  This might be a bit of pie-in-the-sky wishful thinking!

For those of you with technical knowledge, this forum is saved using a MYSQL database.

Thanks,
Matthew
 
Matthew,
I've installed a copy of SMF and will start looking over their conversion scripts. Converting the individual emails into distinct posts won't be a problem. The one hard thing I can see will be message threading and organizing the response chains for each thread. The metadata is there to support it so it should be possible. It's just a bit of a SMOP (small matter of programming  :)).
I should have a chance to play around a bit this week. In the meantime, if anyone has any insight, please drop me a mail.
Many thanks,
Daniel
 
Daniel,

Thanks...having it available online would be great....until then I have it in Thunderbird as well.
 
Daniel Platt said:
Matthew,
I've installed a copy of SMF and will start looking over their conversion scripts. Converting the individual emails into distinct posts won't be a problem. The one hard thing I can see will be message threading and organizing the response chains for each thread. The metadata is there to support it so it should be possible. It's just a bit of a SMOP (small matter of programming  :)).
I should have a chance to play around a bit this week. In the meantime, if anyone has any insight, please drop me a mail.
Many thanks,
Daniel

Are you installing SMF on a Web server?  Or are you installing it locally?  Remember, everything vital to this forum is in a MYSQL database, which is stored on my Web host's server.  Can you simulate that on a local machine?

Thanks,
Matthew

PS: I run SeaMonkey, which is the Mozilla-based e-mail and browser package.
 
Matthew,
I have a web server running PHP and MySQL. It was on this box that I installed the forum software.  Below is the level of detail that I have available to me for each message. Once I determine the schema for the database, it will be pretty easy to import these messages into the MySQL database. The challenges I predict are
  • maintaining the fidelity of the threads
  • handling MIME encoded messages (those that were sent from tools like Outlook and contain rich text formatting)
  • stripping out Yahoo ad banners from the emails

I'll be in touch with more information when I get an opportunity to play around with the data.
-Daniel

Email sample:
From mschenker@-qYO13g-v-VppSzHsCKVvorYylvq0i3E27_mc8SecA_XS1TWCma59ByNkB6hGKH7CyuHdrZDTetbaKoz.yahoo.invalid Wed Sep 01 04:13:58 2004
Return-Path:
X-Sender: mattseeker@-qYO13g-v-VppSzHsCKVvorYylvq0i3E27_mc8SecA_XS1TWCma59ByNkB6hGKH7CyuHdrZDTetbaKoz.yahoo.invalid
X-Apparently-To: FestoolOwnersGroup@yahoogroups.com
Received: (qmail 67507 invoked from network); 1 Sep 2004 11:13:57 -0000
Received: from unknown (66.218.66.216)
  by m2.grp.scd.yahoo.com with QMQP; 1 Sep 2004 11:13:57 -0000
Received: from unknown (HELO n26.grp.scd.yahoo.com) (66.218.66.82)
  by mta1.grp.scd.yahoo.com with SMTP; 1 Sep 2004 11:13:57 -0000
Received: from [66.218.67.166] by n26.grp.scd.yahoo.com with NNFMP; 01 Sep 2004 11:13:07 -0000
Date: Wed, 01 Sep 2004 11:13:04 -0000
To: FestoolOwnersGroup@yahoogroups.com
Message-ID:
User-Agent: eGroups-EW/0.82
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Length: 329
X-Mailer: Yahoo Groups Message Poster
X-Yahoo-Newman-Property: groups-compose
X-eGroups-Remote-IP: 66.218.66.82
From: "Matthew"
X-Originating-IP: 24.218.58.121
Subject: Welcome to the Group!
X-Yahoo-Group-Post: member; u=165162525
X-Yahoo-Profile: mattseeker
X-Yahoo-Message-Num: 1

Hello to all Festool owners and future Festool owners!  Please feel
free to dive in and start asking questions, or share your ideas with
others.  I really want to make this group useful to Festool owners and
help out people who are thinking of buying Festool.

Looking forward to hearing your ideas!

Sincerely,
Matthew Schenker

 
Daniel,
Great to have you here in the group!  I've been working for the past three months to learn everything I can, but I'm not an expert.  So it's nice to have a member who understands databases, MYSQL, and PHP files!

Please let me know what you discover!

Thank you again for your help!  I really appreciate it.

Matthew
 
Hello Matthew,
Just a quick update. I had a chance to play with the migration a bit today and did a dry run. The initial migration went very well. I was able to strip out the HTML and maintain most of the thread integrity. Some threads were split when the email subject was changed. For example, if an initial thread had a subject of:
ZOMG! Festool is the greatest
and someone's email client truncated the reply subject to
ZOMG! Festool is the great
then the latter post and its replies will show as a new thread.
There were a few other things I noticed:
  • All posts were imported with their original post dates and Yahoo user names.
  • There were over 150 pages of topics in the Archive. This may not be an issue if the threads are marked read only and are mostly accessed through a search, but I thought it was worth noting.
  • There was some loss of data in the initial import with some character translation. For example, if someone used an extended ASCII character like an 'endash' rather than an '-' character, this would be converted to a question mark.
  • The email addresses came in as a very messy, Yahoo forum specific string. I'm thinking of converting these to something generic.
Now that I've proved that migration is possible, I'll start working my way through the issues. I should be able to host the resulting data somewhere for you to verify before we start discussing how to get the data into the production forum. The data is currently running on a VMware session on my laptop and isn't very shareable.  :D
More information to come soon.
- Daniel
 
Daniel,

I'm interested in working on this too.  I don't have any experience with MySQL, but I've been a programmer for 35 years and a SQL Server developer for last 15.  I'm kinda maxed right now, but I might be able to help with meta data and things like string parsing routines.

I can think of two issues:

1) Mapping old user names to new user names.  Part of the issue would be number of users involved.  Another part of the issue is a PII (Personally Identifiable Information).  To do an accurate mapping might require PII. 

2) Protecting the current database.  Obviously, we'd need to test this throughly on a test DB after it was carefully backed up.  Assuming that the data could be successfully imported, I'm wondering if it would be best to add it to a separate "Archive" section of the forum.  Then users could search for and find their posts at their leasure, and then post them to a "live" thread.

Just thinking out loud here.

Regards,

Dan.
 
Dan and Daniel,
Thank you both for your expertise here!  As I have said, I'm learning fast when it comes to databases, but no way could I match your extensive knowledge.  So it's great to have both of you working to crack this case!

I'll tell you something else...I've heard from several people who are in the same situation -- trying to transfer an old Yahoo database over to SMF.  If you make it work here, you could be a hero to a lot more than just this forum!  Of course, you belong to us, but we might lend you out to others!

I know this is not an easy task, and I appreciate the level of detail involved.  So take your time.  Keep us posted in this discussion on your progress.

Dan's idea of creating a temporary "Archive" board to house the old posts sounds like a great way to go.  When you're ready for this, let me know and I'll create it.

Thank you again,
Matthew
 
Dan,
Thank you very much for the offer of help. As soon as I get time to work through some of the conversion issues and have a thread hosted ready to share, I'll let you know. I've had a great deal of experience with text munging with a variety of tools (awk, REXX, perl, etc.) so I'm OK with the string processing. However, I could use is your guidance with the migration and identifying any issues with the test DB before we move to production.
wrt your comments,
(1) You are correct to consider the PII concerns. Though, all data contained in the Yahoo forum (IP Address, email address, name, etc.) are publically available. Many of the frequent posters (Matthew, John L., Per, Bill E., etc.) are easily identifiable and can easily be updated on import to map across to their new forum accounts. However, since it would be difficult to do this for everyone, I was just thinking that the Archived messages would simply be posted with the old Yahoo accounts and appear in SMF as a guest poster. The option to migrate user posters is there, but I will leave it to Matthew to decide how he would like to handle this.
(2) I will be testing the migration process by going from my local SMF instance to my personal, third-party hosted SMF instance. Backups, planned downtime and a tested recovery plan are mandatory. We have some work to do before we get to that point.

Matthew,
I too was thinking that all the old messages would be in an Archive forum. I was also thinking these would be locked read only, but that's your call and subject to the capabilities of SMF.
I'll be sure to document what I do to make this migration work so you can share with your counterparts. It'll not be a terribly user friendly conversion, but it possible.

Thank you.
-Daniel
 
Daniel,

Strongly agree on the read-only.    Given the number of messages, that could be a big issue unless it can be set globally. 

Regarding the mapping of Yahoo users to current FOG users...  Assuming that we have can parse the Yahoo user names precisely, I'd suggest:

1) Create a "YahooFOGMap" mapping table containing two columns - "YahooUserName" and "FOGUserName". 

2) Insert the Yahoo User Names into the "YahooUserName" column.

3) If possible, identify the most prodigious Yahoo posters - Matthew of course, John Lucas, Jerry Work, Per, and several others.  We can map the top 10-20 pretty quickly.  Update the those users with their FOG names. 

4) Retrieve a list of users with a "ORDER BY YahooUserName" clause.

5) Send list to Matthew, who posts the list and promotes the heck out of it.  Each current FOG user is supposed to find their Yahoo name and respond to the post with their Yahoo name + a separator value (tilde or something easily identifiable) + FOG name.

6) After some period of time, Matthew runs a SQL query to retrieve the posts into a format that can be easily parsed.

7a) Create a Default Migration User with a name like "Yahoo Migration User" for those posts with not mapped FOG name.

7b) We parse the list and use it to update the YahooFOGMap map table.  Hopefully the active members will have updated the list and we'll have most of the key Yahoo posts covered.  Then we run a query that sets the FOG name to "Yahoo Migration User" where it's NULL.  We'd then run a query something like:
  UPDATE YahooFOGMap
  SET FOGUserName = "Yahoo Migration User"
  WHERE FOGUserName IS NULL


That's ANSI SQL, but I'm not sure if MySQL has any issues with that.

8 ) We run a migration program that 1) converts the original posts into the FOG format, 2) updates the original user names to the FOG user names, and 3) shoves the data into the correct FOG MySQL tables (TEST SERVER!!!) in a relationally consistent format.

9) We test to ensure a valid conversion.  RI should be correct, data is not corrupted, and posts are assigned to the correct user.  More importantly, we need to ensure that the CURRENT system data and metadata is still valid. 

10) Once we've ensured that all is well, we can create a complete rollout plan including detailed steps for system downtime (mandatory), system backup, data migration, validation testing, and system startup.  We'd also need to a set of rollback steps just in case. 

Some initial thoughts.

Regards,

Dan.

p.s., It turns out that I'm currently in charge of the one-time migration process of a major new system for my customer.    After a couple of weeks of coding and tweaking, the migration program (currently about 15 stored procedures) migrates slightly more than 1 billion rows of data, including derivations and data cleansing, in 121 minutes.  (I'm a happy guy.)
 
Dan and Daniel,
See, you are already over my head!

I've just been surfing around in the database software to try and understand better about creating columns and tables and running queries.

Some information and questions for you as you work on the Yahoo-to-SMF migration:
1. My account gives me access to an application called cPanel (v. 10.9.0-STABLE-9966), where I can do an amazing array of things with the data in this forum.  Usually, I leave it alone, as I do not want to break anything.  Are you familiar with cPanel?
2. My account gives me an unlimited number of MySQL databases.  The forum is currently using one database.  Would it make sense to create a new database for the Yahoo messages?  Would it be better to install another instance of SMF on my server where we could test the migration?  Or is it better to have an "Archive" board here in the production forum?
3. Below are some general details about the server that hosts this forum:
    - Running several Perl modules
    - Apache version 1.3.37 (Unix)
    - MySQL version 4.1.21-standard-log
    - PHP version 5.2.1 (3/11/07 update)
    - PERL version 5.8.7

Just so you know, I do a full backup of this forum every day.

Does any of this help?  If there are other details you need, please ask me and I'll retrieve them.

It would be great to see a smooth migration of old posts into the current forum.  Even if they were all in one board, members would probably find it easier to slowly move their Yahoo posts into the correct place in the new forum.

Thanks again for doing this!

Stay in touch,
Matthew
 
Matthew,
I have cPanel on my hosted web site as well. Though I haven't used it a great deal. Your configuration is a standard LAMP (Linux/Apache/MySQL/PHP) stack and is what I'm running on my laptop and my web site. This weekend I'll work through the issues I listed earlier and play around with the account synchronization. Once I'm happy with the quality of the result, I'll export the data from my local MySQL database and load it into my web site for review.
I think it would make the most sense to have the Yahoo messages as a board within the production forum. This would allow people to search in one place for the entire FOG repository (old and new).
It would be very helpful for the migration exercises if you could install another test instance of SMF and populate it with a point in time copy of the production forum. I can easily test the data import to an empty instance of SMF, but this would not have any of the real members to test the account synch. Importing to a copy of the live instance will allow us to give the process proper testing. I can host the test instance if that is more convenient. Just let me know.

Dan,
Thank you very much for your thoughts. This is exactly what I had in mind. (Well, not exactly. I was going to write a perl script to read the mapping file and perform the user name substitutions before the import. The end result should be the same.) I still have some learning to do about SMF before I finalize the process, but you already have a firm grasp on the high level flow we will need to follow.
btw, Good luck on your data migration. That's a pretty impressive time for the data you're processing! Congratulations, you are very right to be happy.  ;D

-Daniel
 
It would be very helpful for the migration exercises if you could install another test instance of SMF and populate it with a point in time copy of the production forum.

Just want to confirm what you mean.  Are you asking me to install another instance of SMF, and copy the databases from the production forum into it?  In essence, I'd be creating a clone of the production forum, with all the messages, attachments, and everything else in it?

Thanks,
Matthew
 
Matthew,
Yes sir, if at all possible. This is really the only way to fully test the import and confirm there is no negative impact. It is nothing that you would need to keep up to date or backup, just a one time snapshop.
Many thanks,
Daniel
 
A few months ago I was able to browse and search the old Yahoo group athttp://groups.yahoo.com/group/FestoolOwnersGroup  The group had a lot of interesting articles, and it was a good source of information.

Unfortunately, it now seems that the group has disappeared. Perhaps Yahoo deletes groups after a certain period of inactivity?

If that's the case, and it's no longer accessible, then I think it would be a good idea re-visit the idea of making the old posts accessible somehow. Hopefully Daniel still has that 81MB archived version up his sleeve!

Forrest
 
Forrest,
I'm not too happy about that!

Here's what happened...I was still getting notices as new people continued to sign up for the Yahoo group.  I'd get follow-up e-mails from new members asking me what is wrong with the group and why can't they post.  So, I got an idea.  Maybe I could find a way to automatically redirect anyone who signs up for the Yahoo group over to this site?  Sounds good, right?  Well, I contacted Yahoo asking if they could in fact redirect members.  I figured it was a longshot, but wanted to try.  I was very surprised when I received an e-mail from Yahoo letting me know that they could in fact do a redirect.  We agreed on it, and I told them to go ahead.

The next day, the whole Yahoo group was disabled.

I've written to Yahoo to ask what happened, but have heard nothing from them.

It's upsetting to me, as I always hoped I could find a way to bring all those old posts over here to the new forum.  Now I'll have to see if in fact Daniel does have that archive!

Oh, Daniel?

Matthew
 
Back
Top