unix - Remove rows in a csv which has a specific entry in one column and entries in other columns are repeated -
i have stumbled upon problem, solved, hook or crook. need in precise solution. being beginner in awk/sed, not solve 1 liner (which sure there is), or awk script, though lot of pipes could.
here question:
i have large .csv file entries similar this:
file (space delimited)
$ cat file d e r none c f g r c f g r none d d e c d d e none g f r t none g f t r none k f r e d r e t y none s c d er d g f r t 4 there no duplicates. if see carefully, there entries in columns 1,2,3,4 repeating, , change 5th column, 'none'. need remove rows (records) repeating in 1,2,3,4 fields , have none in 5th column.
here code wrote, worked, no 1 recommend:
awk '{print $5,$4,$3,$2,$1}' file | sed 's/none/zzz/g' | sort | awk '!array[$2,$3,$4,$5]++' | sed 's/zzz/none/g' and here output got, , expecting.
4 t r f g r g f c c e d d d e r f k d er d c s none r e d none r t f g none y t e r purpose of replacing none zzz after sorting, rows appear in end, , awk remove second occurrences of duplicates remaining columns. same reason inverting column sequence , re-inverting back. sort.
please if help. thanks!
i got solution:
awk '{s=$4" "$3" "$2" "$1; if($5=="none"&& s in a)next;else a[s]=$5" "s}end{for(i in a)print a[i]}' file|sort it outputs:
kent$ awk '{s=$4" "$3" "$2" "$1; if($5=="none"&& s in a)next;else a[s]=$5" "s}end{for(i in a)print a[i]}' file|sort 4 t r f g r g f c c e d d d er d c s d e r f k none r e d none r t f g none y t e r it seems same expectation.
Comments
Post a Comment