Code review: Can this be simplified?

Ok, finally I’ve got a working solution – but it took me quite some time. Sometimes it feels as if I get lost in a jungle of code I produced myself while the solution has always been the tree next to me. Or is this just what it’s like? I would appreciate if anyone experienced could have a look at my code and give their estimation if that’s reasonable complexity or wilderness.

This is the task:

I have an array with train journeys that I want to visualize in an svg. That is already working. Currently, some trains have the same route, but terminate at a more distant destination than the other. In this case I would have two lines on top of the other which I’d like to avoid in order to have one clickable line group. Therefore I want to filter out the shorter, duplicate train lines.

This is what a part of my array looks like:

const trainLines = [
[{name: 'Berlin Hbf', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}, {name: 'Lübben (Spreewald)', pos: ['1250.1', '458.1'], dur: 44}, {name: 'Cottbus Hbf', pos: ['1277.5', '476.8'], dur: 67}],
[{name: 'Berlin Hbf', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}]
// and so on...
]

Here I would like to find the second array in my trainLines array and filter it out. Can’t be that complicated :crazy_face:

This is my code:

const reducer = (acc, e) => acc + ' ' + e; 
const joinNames = el => el.map(item => item.name).reduce(reducer); // concatenating the name values of each stop per train line and turning them to a string
const uniqueJourneys = trainLines.filter(el => !(items.map(elmnt => joinNames(elmnt))
            .find(item => item.includes(joinNames(el)) && item.length > joinNames(el).length)));

Does anyone have an idea if there is a more efficient and more readable solution? Are there any conventions I ignored – like using iterators inside iterators?

Just a few loose notes.


Whenever you feel urged to add a comment to a one-liner it might be a good idea to just try to rename the function, there is nothing wrong in names like getConcatenatedNameValues. Names of variables, fields, classes are a great place to convey the meaning and goal of given entity.


When you have data that is already structured and you want to convert it to a string to perform searching or filtering it might be a bad idea because string operations are usually quite slow. Let’s say that these are our train lines:

const trainLines = [
    [
        {
            name: 'Berlin Hbf',
            pos: ['1202.6', '392.5'],
            dur: 0
        }, {
            name: 'Königs-Wusterhausen',
            pos: ['1220.5', '415.6'],
            dur: 18
        }, {
            name: 'Lübben (Spreewald)',
            pos: ['1250.1', '458.1'],
            dur: 44
        }, {
            name: 'Cottbus Hbf',
            pos: ['1277.5', '476.8'],
            dur: 67
        }
    ], [
        {
            name: 'Königs-Wusterhausen',
            pos: ['1220.5', '415.6'],
            dur: 0
        }, {
            name: 'Lübben (Spreewald)',
            pos: ['1250.1', '458.1'],
            dur: 26
        }
    ]
];

in this case, the computer will have to spend time figuring out if the string 'Königs-Wusterhausen Lübben (Spreewald)' is part of the 'Berlin Hbf Königs-Wusterhausen Lübben (Spreewald) Cottbus Hbf' but we missed something - we are missing the delimiters between train stops so the computer has to check each index (not really, depending on the algorithm, but it has to deterministically state that 'Königs-Wusterhausen Lübben (Spreewald)' != 'lin Hbf Königs-Wusterhausen Lübben (Sp') and if we would stick to the arrays we would never do that comparison because we know where the name of a given train station begins.


Try to check how many times your code executed joinNames function and try to calculate how many times it would be really necessary. I know that it’s pretty fun to write short code but it definitely makes sense in that case to make some sort of “pre-processing” stage where you could just calculate this concatenated name for all the train lines.


Sometimes it feels as if I get lost in a jungle of code I produced myself while the solution has always been the tree next to me.

I have a funny feeling that a tree :smiley: structure or a graph might be a brilliant solution to your problems. Can you tell us a bit more about your project? How do you want to handle situations when there is a partial overlap between train lines? Or what about train lines in reversed order?


edit

I would also like to note that converting your data into string format based solely on the train station name might lead to some problems given that in the real world we can find multiple instances of train station names that are already part of the name of another train station.

For example:

I assume that this does not happen in your data but I just wanted to point out how problematic it might be to get rid of the structure of your data. And it might be a good challenge for you to try to handle situations like this :slight_smile:

4 Likes

Wow, thank you very much for your detailed answer, Maciej! :hugs:
Very good point that using string concatenation for comparison takes too long due to too many operations running. You caught me when assuming that I focussed too much on having the shortest possible code while neglecting the processes running in the background. I will try to find a way that just uses the array data and calculate the amount of calculations necessary.

I plan a website that illustrates where you can get in a given time from a specific destination with regional trains. The time can be chosen by the user with a range slider. The start destination will be just Berlin for a start due to handwritten data. Then I want to show which official cycling routes are along the train line.

Currently I have hand written train data (unfortunately I couldn’t find a public API providing this data) looking like this:

const dbRoutes =  {
    "RE1": [{name: "Magdeburg Hbf", stop: true, dur: 0 }, {name: "Magdeburg-Neustadt", stop: false, dur: 3 }, {name: "Burg\xa0(Magdeburg)", stop: false, dur: 12 }, {name: "Güsen", stop: false, dur: 7 }, {name: "Genthin", stop: false, dur: 8 }, {name: "Wusterwitz", stop: false, dur: 9 }, {name: "Kirchmöser", stop: false, dur: 4 }, {name: "Brandenburg Hbf", stop: true, dur: 6 }, {name: "Werder\xa0(Havel)", stop: false, dur: 15 }, {name: "Potsdam Hbf", stop: true, dur: 10 }, {name: "Berlin-Wannsee", stop: false, dur: 9 }, {name: "Berlin-Charlottenburg", stop: false, dur: 9 }, {name: "Berlin Zoologischer Garten", stop: false, dur: 4 }, {name: "Berlin Hbf", stop: true, dur: 5 }, {name: "Berlin Friedrichstraße", stop: false, dur: 5 }, {name: "Berlin Alexanderplatz", stop: false, dur: 3 }, {name: "Berlin Ostbahnhof", stop: false, dur: 5 }, {name: "Berlin Ostkreuz", stop: false, dur: 2 }, {name: "Erkner", stop: false, dur: 16 }, {name: "Fangschleuse", stop: false, dur: 5 }, {name: "Fürstenwalde (Spree)", stop: true, dur: 8 }, {name: "Berkenbrück", stop: false, dur: 6 }, {name: "Briesen", stop: false, dur: 5 }, {name: "Jacobsdorf", stop: false, dur: 5 }, {name: "Pillgram", stop: false, dur: 3 }, {name: "Frankfurt-Rosengarten", stop: false, dur: 4 }, {name: "Frankfurt\xa0(Oder)", stop: true, dur: 5 }], // and many more
    

I slice these routes at the start destination so I have two routes: Berlin - Magdeburg and Berlin – Frankfurt (Oder)

Currently I just render direct routes. In a later step, I would like to combine train routes, so I have destinations that can be reached at the chosen time either directly or with one or more changes. Then maybe one may reach Poznań once it’s ready :slight_smile:

1 Like

That sounds like a really cool project! :rocket:

Hm, I don’t have much experience in regards to transportation data but you might be interested in researching GTFS (general transit feed specification) - it is a format that originated at Google and it’s used to share transit data.

OpenMobilityData has a few publicly available GTFS feeds from Germany - Feeds - OpenMobilityData. It might be worth checking out.

And as for the data structure to hold data, I would look into graphs. This will make it much easier for you to find travel duration when more than one train line will have to be used. And the duplication problem would just become obsolete :slight_smile:

Well, good luck with your project and I hope to see some updates :slight_smile:

3 Likes

Thanks a lot! I will definitely look into this.

Not sure what you mean by graphs tbh, I guess you’re talking about something coding related rather than visual graphs? :thinking:

I had the idea to this project when thinking about the final project for the Frontend path, so you’ll very likely see the result eventually (though it might take some time, I’m currently at 81% and progress is getting much slower :slight_smile: )
Cheers!

1 Like

Oh, sorry about that.

A graph in computer science is an abstract data type used to represent structures that resemble visual graphs - so it is a data structure that holds information about vertices (train stations) and connections between those vertices - edges (train lines).

Using this data structure will allow you to quickly find out where can you go from the current location or to find the shortest route from location A to location B.

But because you are on the frontend path I am not sure how deep you want to go on algorithmic challenges. Wikipedia article about graph data structure.

If you will limit the number of stations and train lines it might be okay to just use the arrays in the form you already defined. But unfortunately, I have a feeling that some of the features you plan on making might require too much time without using the right data structure.

4 Likes

Interesting problem. Your code is giving a reference-error: ReferenceError: items is not defined

That said, I have to agree with @factoradic that using graphs has to be considered for this but the higher complexity of that has to be weighed against what the planned functionality is and if it is required or not. If you decide to explore graphs more then this chapter/exercise is an interesting introduction: Project: A Robot :: Eloquent JavaScript

On another node, maybe it’s an option to paint overlapping train lines? Many maps of subways and trains are painted in such ways with parallel lines. After all, you may have a train from Munich-Frankfurt-Berlin and another line from just Frankfurt-Berlin. I would think that drawing all lines is preferred in a way if click functionality is added, times presented and so on. Otherwise, some routes would not be visible perhaps. Just another consideration that you probably already though about…

1 Like

Oops, yes. I renamed items to trainLines on the forum because that variable name is more telling and forgot to rename the second instance of it. This snippet is working:

const trainLines = [
[{name: 'Berlin Hbf', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}, {name: 'Lübben (Spreewald)', pos: ['1250.1', '458.1'], dur: 44}, {name: 'Cottbus Hbf', pos: ['1277.5', '476.8'], dur: 67}],
[{name: 'Berlin Hbf', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}]
// and so on...
]
const reducer = (acc, e) => acc + ' ' + e; 
const joinNames = el => el.map(item => item.name).reduce(reducer); 
const uniqueJourneys = trainLines.filter(el => !(trainLines.map(elmnt => joinNames(elmnt))
                    .find(item => item.includes(joinNames(el)) && item.length > joinNames(el).length)));
                    
console.log(uniqueJourneys)

My code currently paints overlapping train lines if they end at different destinations. It just filters out those lines that go on the same track as another one the whole time. Example:

Train 1 from Berlin to Wittenberg via Jüterbog
Train 2 from Berlin to Falkenberg via Jüterbog
Train 3 from Berlin to Jüterbog

Train 1 and Train 2 are both painted and overlapping from Berlin to Jüterbog. Train 3 for my purposes is superfluous and filtered out. You can always reach that destinations by getting off early from train 1 or 2.
Currently, if you click the line between Berlin and Jüterbog the line painted after the other – Berlin to Falkenberg – is highlighted. If you want to show the details of Train 1 you’ve got to click on the section between Jüterbog and Wittenberg.
Painting the parallel routes slightly beneath each other would be nice but far too complex for the benefit I would get from that.
I have a separate list with coordinates for each destination on an svg. I get the positions for the train stops from that list. The train lines array already is a product of a function that merges the destination list and the train list.
The original data looks like this:

const cities = {
    'Berlin': { pos: ['1202.6', '392.5'], bundesland: 'Berlin'},
    'Wandlitzsee': { pos: ['1214.9', '361.3'], bundesland: 'Brandenburg'},
//...
}
const dbRoutes =  {
    "RE1": [{name: "Magdeburg Hbf", stop: true, dur: 0 }, {name: "Magdeburg-Neustadt", stop: false, dur: 3 }, {name: "Burg\xa0(Magdeburg)", stop: false, dur: 12 }, {name: "Güsen", stop: false, dur: 7 }, {name: "Genthin", stop: false, dur: 8 }, {name: "Wusterwitz", stop: false, dur: 9 }, {name: "Kirchmöser", stop: false, dur: 4 }, {name: "Brandenburg Hbf", stop: true, dur: 6 }, {name: "Werder\xa0(Havel)", stop: false, dur: 15 }, {name: "Potsdam Hbf", stop: true, dur: 10 }, {name: "Berlin-Wannsee", stop: false, dur: 9 }, {name: "Berlin-Charlottenburg", stop: false, dur: 9 }, {name: "Berlin Zoologischer Garten", stop: false, dur: 4 }, {name: "Berlin Hbf", stop: true, dur: 5 }, {name: "Berlin Friedrichstraße", stop: false, dur: 5 }, {name: "Berlin Alexanderplatz", stop: false, dur: 3 }, {name: "Berlin Ostbahnhof", stop: false, dur: 5 }, {name: "Berlin Ostkreuz", stop: false, dur: 2 }, {name: "Erkner", stop: false, dur: 16 }, {name: "Fangschleuse", stop: false, dur: 5 }, {name: "Fürstenwalde (Spree)", stop: true, dur: 8 }, {name: "Berkenbrück", stop: false, dur: 6 }, {name: "Briesen", stop: false, dur: 5 }, {name: "Jacobsdorf", stop: false, dur: 5 }, {name: "Pillgram", stop: false, dur: 3 }, {name: "Frankfurt-Rosengarten", stop: false, dur: 4 }, {name: "Frankfurt\xa0(Oder)", stop: true, dur: 5 }],
    //...
}

The array trainLines is recalculated with a custom React hook each time the user submitted a maximum travel time. The travel duration is calculated by adding the travel duration between each stop from the dbTrains object. That determines the total length of the line group based on the user’s input. The pos values are then used for drawing an SVG line from the previous destination to the current one.

I will probably have a look at graphs after I have a first running site and consider refactoring. For now, I will definitively consider @factoradic 's suggestion not to use string concatenation for filtering out duplicate routes. And maybe add unique Ids for each train station. Because he’s right with this:

Bildschirmfoto 2021-11-05 um 08.24.12
The second highlighted train station is called ‘Wittenberge’. I just clicked on ‘Wittenberg’.

Ok. Here is an array-only solution although I haven’t tested it much:

    trainLines.sort((a, b) => b.length - a.length); // Sort train lines from longest to shortest. Can be done ahead of time.
    const filteredLines = trainLines.reduce((prev, cur, idx) => {
        const curNames = cur.map((station) => station.name);
        // Loop and compare current train line with all previous lines of equal or longer length
        for (let i = 0; i < idx; i++) { 
           const found = curNames.every((station) => trainLines[i].map((station) => station.name).includes(station));
           if (found) return prev; // If we have a line with all the stations, then don't add current line
        }
        return prev.concat([cur]); // No line found with all stations, add current line
    }, []);

This would be easy to refactor to use IDs if desired. It will also filter out one of 2 identical train lines (your solution would keep both). But I have to admit that it’s probably not the most optimized solution either as it does a bit of nested iteration. But unless the dataset gets very large that may not be a huge concern - especially if you just call this once in the app. You can always measure the performance down the line and optimize if necessary. Maintaining the trainLines array in the order of descending length would help. But one little consideration is this only checks for all stations appearing in a previous train-line, not the order. So while unusual, it is possible to have a line going between same stations in a different order. Something to keep in mind (your solution I believe considers the order). The graph structure, if implemented, would more accurately allow a description of the data in geographical space.

1 Like

Cool, that seems to be a working solution. Thanks for taking the time.
I also came up with another solution:

const trainLineStopsArrCopy = trainLines.map(trainLineArr => trainLineArr.map(trainStop => trainStop.name));
        const returnArrayOfTrainStopNames = arr => arr.map(trainStop => trainStop.name);
        
//return the arrays that do not match the criteria
// shorter and same stops than another array
const uniqueJourneys = trainLines.filter(trainLineArr => !(returnArrayOfTrainStopNames(trainLineArr)
// check if each train stop in the current list can be found 
// in at least one array of the the second copy of train routes list
 .every(trainLineStop => trainLineStopsArrCopy
// find the train route that includes the current train stop
// and at the same time has less train stops included than the target route
.find(trainLineStopArrCopy => trainLineStopArrCopy
.includes(trainLineStop) && returnArrayOfTrainStopNames(trainLineArr).length < trainLineStopArrCopy.length))));

And the same thing without comments:

const trainLineStopsArrCopy = trainLines.map(trainLineArr => trainLineArr.map(trainStop => trainStop.name));
const returnArrayOfTrainStopNames = arr => arr.map(trainStop => trainStop.name);
                    
const uniqueJourneys = trainLines.filter(trainLineArr => !(returnArrayOfTrainStopNames(trainLineArr)
  .every(trainLineStop => trainLineStopsArrCopy
  .find(trainLineStopArrCopy => trainLineStopArrCopy
  .includes(trainLineStop) && returnArrayOfTrainStopNames(trainLineArr).length < trainLineStopArrCopy.length))));

That also does a bit of nesting. Yours seems to be more concise. I’ll take a second look to fully understand it. And I’ll see if I can find out which solution is faster.

Do you know if there is a convention not to nest iterators? Or if you just shouldn’t nest iterators and loops?
When I paste your solution into a jsfiddle, I get a warning that nesting a loop in an iterator is not a good practice. I don’t get that when I paste my solution, although I apparently do even more nesting.

How long did it take you to come up with this?

That’s fine. My new solution doesn’t either and it is impossible to happen that order matters in that case that each stop is represented in another route while having another order.

Do you know a good tool where I can do that?

1 Like

Took maybe 45min I am guessing. I am not sure about convention of not nesting a loop inside reduce. I did not get any alert about it in VS Code/ES Lint. But clearly, the more nesting of loops (either for-loops or ES6+ type iterating array methods forEach/filter/reduce…) - the slower the program typically is (and higher big-O runtime). It can very likely be improved, but may not always be worth the work… For measuring - 2 methods comes to mind. You can create a large dataset (perhaps with random values). Then use the difference of 2 JS Dates and see how long it took (possibly running through it several times). Or you can do it in the Browser Dev tools (Chrome: Performance - Profiler) and see which functions take most time in the program (in percent and ms). I haven’t tried the 2nd method personally but I know it’s an option.

1 Like

I was curious and fyi, your second method seems to be a lot faster than mine. I extended the data a bit. 5 different arrays and I multiplied them 10,000 times for an array of length 50k.

const trainLines = [
    [{name: 'Berlin Hbf', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}, {name: 'Lübben (Spreewald)', pos: ['1250.1', '458.1'], dur: 44}, {name: 'Cottbus Hbf', pos: ['1277.5', '476.8'], dur: 67}],
    [{name: 'Berlin Hbf', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}],
    [{name: 'Frankfurt', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}, {name: 'Lübben (Spreewald)', pos: ['1250.1', '458.1'], dur: 44}, {name: 'Cottbus Hbf', pos: ['1277.5', '476.8'], dur: 67}],
    [{name: 'Frankfurt', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}],
    [{name: 'Munich', pos: ['1202.6', '392.5'], dur: 0}, {name: 'Königs-Wusterhausen', pos: ['1220.5', '415.6'], dur: 18}],
    ]
const repeatedLines = new Array(10000).fill(trainLines).flat();

Wrapped in function:

function filterRoutes2(trainLineArr) {
    const trainLineStopsArrCopy = trainLineArr.map(trainLineArr => trainLineArr.map(trainStop => trainStop.name));
    const returnArrayOfTrainStopNames = arr => arr.map(trainStop => trainStop.name);
    const uniqueJourneys = trainLines.filter(trainLineArr => !(returnArrayOfTrainStopNames(trainLineArr)
    .every(trainLineStop => trainLineStopsArrCopy
    .find(trainLineStopArrCopy => trainLineStopArrCopy
    .includes(trainLineStop) && returnArrayOfTrainStopNames(trainLineArr).length < trainLineStopArrCopy.length))));
    return uniqueJourneys;
}

And the code to check time:

console.time('timer2');
console.log(repeatedLines.length)
const routesMethod2 = filterRoutes2(repeatedLines);
console.log(routesMethod2);
console.timeEnd('timer2');

So 50,000 length array of routes with lots of repetition. Your function was 60ms and mine was 14s. So nice work! I am guessing Node did some clever optimizations due to the repetitive data but it’s still impressive in 60ms. I did not realize node was this fast and I am on an old laptop.

Edit: So I checked your string method and it was even faster at around 15ms. Interesting…

1 Like

What? Wow! Interesting indeed. Thanks for that!

I tried to count the iterations done by each of the three methods by adding a counter to each of the iterator methods that accept a callback function and the for loop. That does not include the .includes() method that is doing all the work of comparing each letter of a string to the other string in my initial method.
Then I ran the functions with the two lines array that I provided here. These are the numbers iterations that I got:

My initial string concatenation method: 52
My second method: 35
Your method: 20

That really doesn’t explain the significant time gap between your approach and my approaches :woman_shrugging:
The calculations can be found here:

1 Like

To get a more representative count, I would have the main array be significantly longer than the subarrays (say 100 elements) and sub array maybe 5-10 elements. Going through the longer array will take more time/add more counts.

1 Like

Still doesn’t make much sense to me:

My initial string concatenation method: 1765
My second method: 496
Your method: 727

Ran it with my current train array with 11 routes and a lot more train stations

To make the testing and optimization process a little bit more interesting (hopefully) I extracted data from the single GTFS feed. This data is probably out of date but it is still closer to real-world data than an array multiplied many times.

You can get data from this snippet → Routes from Deutsche Bahn GTFS feed · GitHub. The first file is the CSV I get from my queries and the second file is a .js file with all the routes extracted.

I tried to mimic your data format but I decided to keep the arrival and departure times to make sure that there will be a lot of duplicated routes.


@mike7127143742 I like your approach, the sorting ahead of time is a great idea. But the same idea can be applied to getting only the stop names. This single line:

trainLines = trainLines.map(line => line.map(station => station.name));

can make your code a lot faster as you will not have to do that later for every element you check and for every element you check against.


@mirja_t @mike7127143742 you might be interested in the fact that all three proposed solutions yield different results for the test data set I prepared :slight_smile: So it might make more sense to check how they differ in a functional way before trying to find the optimal solution in terms of time.

2 Likes

Oh cool, thanks a lot again!

I followed your tip with the gtfs data and actually found relevant data for my project here:
https://openmobilitydata.org/p/verkehrsverbund-berlin-brandenburg/213
I haven’t really understood how to work with it though. In this very moment I’m trying to do the same thing what you did with the data and turn them into my JS object format.
For that purpose I’m turning csv file by csv file to a JSON object to later on filter the data I need and merge them together. But the csv files are so huge that csv to json converters can’t handle it. So I ended up filtering out all Bus journeys and trips for the same route from the csv files. Again – by hand :woozy_face:
I’m sure you did it way way smarter, right?

I never dealt with GTFS files before so it’s definitely not a perfect process.

I created table for each file I was interested in - routes, stop_times, stops and trips. Table for routes:

CREATE TABLE gtfs_routes(
   route_id         INTEGER NOT NULL PRIMARY KEY 
  ,route_short_name VARCHAR(30)
  ,route_long_name  VARCHAR(8) NOT NULL
  ,route_type       INTEGER NOT NULL
  ,agency_id        VARCHAR(3) NOT NULL
);

I then added foreign keys to make joins between tables faster.

The next step was to load data from the csv files using LOAD DATA INFILE statement and then it was just matter of executing single query:

select
  distinct r.route_long_name,
  concat(
    '[',
    group_concat(
      concat(
        "{name: '",
        s.stop_name,
        "', pos: ['",
        s.stop_lat,
        "', '",
        s.stop_lon,
        "'], arrival_time: '",
        st.arrival_time,
        "', departure_time: '",
        st.departure_time,
        "'}"
      )
      order by
        st.stop_sequence separator ", "
    ),
    ']'
  ) as 'trainLine'
from
  gtfs_stop_times st
  left join gtfs_trips t on t.trip_id = st.trip_id
  left join gtfs_stops s on s.stop_id = st.stop_id
  left join gtfs_routes r on r.route_id = t.route_id
group by
  r.route_long_name,
  t.trip_id;

and dumping everything into a file.


But the same process can be executed by using node and sqlite (for example).


edit

You will have to merge data from a few sources to get data in that format. So using SQL makes sense. But if you don’t feel comfortable with SQL then using pandas or even pivot tables in Excel will do the same job.

2 Likes

I refactored it based on the suggestion of @factoradic and the numbers were much better although a bit longer code perhaps:

function filterRoutes1(trainLineArr) {
  trainLineArr.sort((a, b) => b.length - a.length); // Sort train lines from longest to shortest.
  trainLineNames = trainLineArr.map(line => line.map(station => station.name)) // Get names
  const filteredLineNames = []; // Stores arrays of station names
  const filteredLines = []; // Stores arrays of line objects
  
  // Loop and compare current train line names with all stored line names
  trainLineNames.forEach((trainLine, idx) => {
      for (const storedLine of filteredLineNames) {
        let found = trainLine.every((station) => storedLine.includes(station));
        if (found) return;
      }
      filteredLineNames.push(trainLine); // Add current line name array
      filteredLines.push(trainLineArr[idx]) // Add current line object
  })
  return filteredLines;
}

Anyway, I am sure most of these options will work ok unless it’s a really large train network :slight_smile:

1 Like

Thanks for the code how you structured the CSV data. Diving into this table stuff will probably take longer than doing it by hand as I already started doing, but definitively worth a try and I’ll give an update once I managed to get the data on track. I’ll try Node Express. Might take some time. :slight_smile:
Then I’ll test the solutions for filtering the data, including your last suggestion, @mike7127143742 , thanks again for that!

1 Like