Jump to content
Server Maintenance This Week. ×

Creating A Relational DB from a Flat File


Recommended Posts

  • Newbies

Good morning,

I have a flat file (Excel) that has a weekStop Export Data 2023-01-01 to 01-07.csv of data with the following columns VehicleID, VehicleName, DriverID, DriverFirstName, DriverLastName, RouteID, RouteName, PatternID, PatternName, StopID, StopName, StopNumber, Timepoint, TripID, TripName, RunID, RunName, BlockID, Arrive, ArriveVariance, ScheduledArrive, Depart, DepartVariance, ScheduledDepart, Ons, Offs, ArrivalPassengers,VehicleCapacity, and DepartingPassengers.

I created the following tables:

  • Vehicle with VehicleID, VehicleName, VehcileCapacity
  • Driver with DriverID, DriverFirstName, DriverLastName
  • Route with PatternID, PatternName, RouteID, RouteName
  • Stops with StopID, StopName, StopNumber
  • Schedule with Timepoint, TripID, TripName, RunID, RunName, BlockID, ArriveDate, ArriveTime, ArriveVariance, DepartDate, DepartTime, DepartVariance, ScheduledArriveDate, ScheduledArriveTime, ScheduledDepartDate, ScheduledDepartTime

Each table has the automated Primary Key. I am not sure how to make the relationships (ERD). Should I use the Primary Keys, or the imported IDs (VerhicleID, DriverID, etc)?

I have attached a smaller file of one day for reference, as a week file is 30mb.

I'd appreciate any help on this.

 

Thanks,

Craig

Stop Export Data 2023-05-02.xlsx

Link to comment
Share on other sites

If the imported IDs are both unique and permanent, you'd be better off using them instead of your own. This will allow you to import the flat data directly into the target tables (separately for each table, of course) and have it link up automatically.

However, this is assuming you won't be creating any records of your own in these tables. If you do, you will have to find a method to generate an ID that is guaranteed to be different from any ID that your data provider may come up with. Or move to using your own IDs - which will make the import action much more complicated.

 

 

Link to comment
Share on other sites

  • Newbies

All of the data will always be imported.

Should I create tables Trips (TripID, TripName), Runs (RunID, RunName), Blocks (BlockID) since they have their own ID, or leave them in Schedule?

  • Schedule with Timepoint, TripID, TripName, RunID, RunName, BlockID, ArriveDate, ArriveTime, ArriveVariance, DepartDate, DepartTime, DepartVariance, ScheduledArriveDate, ScheduledArriveTime, ScheduledDepartDate, ScheduledDepartTime

What key field should be used in Schedule?

Next question is how would I connect the tables?

Link to comment
Share on other sites

9 hours ago, CBarnes said:

Should I create tables Trips (TripID, TripName), Runs (RunID, RunName), Blocks (BlockID) since they have their own ID, or leave them in Schedule?

I don't know. It depends on what you intend to do with the data. The reasons for normalizing a database are functional, not ideological. If all your data comes from an outside source and you do not intend to modify it, then the most compelling reason to normalize - i.e to avoid update anomaly - is absent. 

 

9 hours ago, CBarnes said:

What key field should be used in Schedule?

IIUC, Schedule is the most atomic table here - IOW, it is a child table to several other tables, but parent to none. As such, it doesn't really need to have a primary key. Again, it depends on what your purpose is. Most often, we tend to include an auto-entered serial number or UUID field "just in case it might be needed at some point".  

 

 

Link to comment
Share on other sites

  • Newbies

The original source does not have reporting and sorting capabilities. It will be used for reporting and looking up data. For example all 56005 trips in one day, or all weekdays. Many reports plan to be built from the data. 

Link to comment
Share on other sites

6 hours ago, CBarnes said:

It will be used for reporting and looking up data.

That doesn't mean much. Some reports require (or at least can benefit from) a normalized structure, some (perhaps even most) can be produced just as easily from a flat table. It all depends on the specific details of the report.

I am not trying to dissuade you from normalizing your data. I am just saying the extra effort might not be worth it.

 

Edited by comment
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.